![]() ![]() ![]()
|
School of Computing and Mathematical Sciences, De Montfort University, Leicester 24 September 1995 |
3.1 Factors influencing overall system performance
Processor performance
(see next sub-section for a detailed discussion)
determines instruction execution speed, arithmetic performance, interrupt handling capability (Rabbat et al 1988), etc.
Main or primary memory size
is critical in system performance. In a single user system it limits the size of program and/or data set which can be processed. In a multiprogramming/virtual memory environment it effects the number of concurrent processes which can be held without swapping to and from disk. In general the more powerful the processor the larger the memory, ie a general rule is that as processor power increases so does the user requirements and this leads to larger and more complex programs. When determining the main memory size required for a system allowance must be made for the operating system, eg a sophisticated operating system such as UNIX or OS/2 typically requires 4 to 8Mbyte for resident components and work area.
Secondary memory (disk) size
determines the number of programs and data sets which can be accessed on-line at any instant. For example, in a single user word processing environment only one or two documents will be accessed at a time, which could be held on a small floppy disk. On the other hand, a large multi-user minicomputer could have 50 simultaneous users running large programs with large data sets requiring 1000Mbytes or more of disk space. Again when estimating disk requirements allowance has the to made for the operating system, eg UNIX typically requires of the order of a 100Mbytes if all utilities and help files are on-line.
Input/output bandwidth
is a measure of how fast information can be transferred between the processor, memory and I/O devices (see data bus size in next sub-section).
Network capability
is important in a distributed environment where a number of separate systems are connected via a network, eg personal workstations accessing a shared central database.
3.2 Factors influencing processor performance
The performance of the processor in terms of program size and execution speed is determined by a number of factors.
3.2.1 Internal processor architecture
which determines:
(a) The number of processor registers (high speed memory within the CPU) used for the storage of temporary information and intermediate results. For example, holding local variables in CPU registers reduces traffic to/from main memory and hence overall program execution time.
(b) The number of instructions available: a statement in a high-level language is mapped into a sequence of processor instructions. The approach taken in what has become known as CISC architecture (complex instruction set computer) was that increasing the number of instructions shortened the executable code and the program executed faster (see the discussion on CISC/RISC machines below).
(c) The number of addressing modes: the processor uses addressing modes to access the operands (data to be operated on) of instructions. The approach in CISC architectures was to increase the number of addressing modes to allow direct manipulation of more and more complex data structures, eg records and arrays of records.
(d) The data size of the ALU (Arithmetic/Logic Unit). The ALU can directly manipulate integer data of a specific size or sizes, eg 8, 16, 32 or 64 bit numeric values. For example, a 32-bit ALU can add a pair of 32-bit numbers with one instruction whereas a 16-bit ALU would require two instructions.
The control unit of first (valve) and second (transistor) generation computer systems was 'hardwired' in that physical circuitry fetched, decoded and executed instructions. The major problem with very complex 'hardwired' circuits is that modifications are difficult and expensive. The advent of integrated circuits (used in third and fourth generation computers) enabled the building of ROMs on the processor chip which then allowed practical microprogramming (Tanenbaum 1990). In a microprogrammed control unit the fetch, decode and execute of instructions are controlled by a ROM based 'microprogram' in the control unit which 'executes' the instructions received by the processor as a succession of simple microinstructions. The advantage of using ROM based microcode is that it is 'easier' to modify that an equivalent 'hardwired' circuit.
Over the past twenty years more and more instructions have been added making the microprogram of typical CISC computers (eg Intel 80486, Motorola 68040) very complex and difficult to debug (see the discussion on CISC and RISC machines below).
Events within the system are synchronised by a clock which
controls the basic timing of instructions or parts of instructions.
A particular microprocessor may be available in a range of clock
speeds. For example, Table 1 presents a summary of the relative
performance of the Motorola MC68000 family against clock speed
(the performance in Mips is a guide and will be effected by factors
such as cache hit rate, etc., see section 7). All things being
equal, a 25MHz MC68020 will execute instructions twice as fast
as an 12.5MHz version, but costs more.
|
|
|
|
|
|
|
|
|
10 12.5 16.65 25 33 50 |
|
0.8 |
0.8 1.1 |
2.2 3.0 6.0 |
|
29.0 |
The Intel 80486DX2 and 80486DX4 processors have on-chip clock multipliers which multiply the clock by *2 and *3 respectively, ie on-chip operations are performed at two or three times the external clock speed making a particular improvement in processor bound jobs. In addition, the DX4 has a large cache (hence DX4 rather than DX3). This has little effect on I/O bound jobs (eg a database server or a file server) where a Pentium with a 64-bit bus would be used.
Main memory speed should match the speed of the processor. A 25MHz MC68020 requires faster (hence more expensive) memory than a 12.5MHz version. If necessary, memory attached to a MC68020 can delay the processor on a memory read/write by using WAIT states, which makes the processor idle for one or more clock periods and hence slows the overall execution speed. A common tactic is to build machines with a fast processor and clock but with slow (and cheap) memory, eg the unwary could be caught by a machine advertised as having a 25MHz CPU but which could execute programs slower than a 12.5MHz machine.
The number of address lines determines the memory address space of a processor, ie both the maximum amount of physical main memory which can be accessed (if fitted) and the maximum logical memory size in a virtual memory environment. Therefore the address bus size effects maximum program/data size and/or the amount of swapping and paging in a multiprogramming/virtual memory environment. For example, 16 address lines can access a maximum of 64Kbytes, 20 lines 1Mbyte, 24 lines 16Mbyte and 32 lines 4Gbyte.
It must be noted that even though a processor has a particular address space this does not mean that a computer system will be or can be fitted with the maximum amount. For example, a processor with 32 address lines has an address space of 4Gbyte but typical 32-bit machines are fitted with anything between 4Mbyte and 256Mbyte of physical memory. The 4Gbyte address space becomes important under a virtual memory environment where very large programs can be executed on machines with much smaller physical memory. In practice there is a maximum amount of memory which can be fitted to a particular model of machine (determined by the layout of the machine in terms of bus slots, physical space available, etc). One of the major differences between personal workstations and mini/mainframe computer systems is that the latter can generally be fitted with much larger physical memory.
The width of the data bus determines how many memory read/write
cycles are required to access instructions/data and has a major
effect on I/O bandwidth, eg if a processor has a 16-bit data bus
it will require two memory accesses to read a 32-bit number while
a processor with a 32-bit data bus would require a single access.
A question often asked is why a multi-user minicomputer can be
up to ten times the cost of a personal workstation with similar
processor performance. The answer is that when purchasing minicomputers
and mainframe systems one is buying, to a large extent, I/O bandwidth
and physical memory capacity. An example (from a few years ago)
is the comparison between an Apollo DN3000 workstation (based
on a MC68020 12MHz microprocessor) and the DEC VAX 8200 minicomputer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The figures are order of magnitude guides (processor performance
guides such as Mips, millions of instructions per second, will
be discussed in section 7) but do give an indication of different
areas of application of the systems. The Apollo is a single user
workstation used for highly interactive computational tasks and
the VAX would typically be used by a number of concurrent users
(eg five to ten) to run tasks which are not heavy in computational
terms but which require a system capable of supporting the I/O
of a number of users (eg multi-user databases, sales/stock control
packages, accounting packages, etc.)
| Microprocessor manufacturer & type | address bus size in bits | maximum memory bytes | data bus size in bits | clock |
|
Intel 8080 Zilog Z80 Motorola 6800 Intel 8088 (IBM/PC) Intel 8086 (IBM/PC XT) Motorola 68008 Motorola 68000, 68010 Intel 80186, 80286 Motorola 68020/30/40 Intel 80386SX Intel 80386DX Intel 80486DX Intel 80486SX Intel 80486DX2 Intel 80486DX4 Intel Pentium |
16 16 16 20 20 20 24 24 32 24 32 32 32 32 32 32 |
64K 64K 64K 1M 1M 1M 16M 16M 4G 16M 4G 4G 4G 4G 4G 4G |
8 8 8 8 16 8 16 16 32 16 32 32 32 32 32 64 |
*1 |
Table 2 shows address and data bus sizes for various microprocessors:
Table 2 shows the maximum amount of primary memory which can be addressed. In practice a computer system may be fitted with less, eg typically a MC68030 system has 16, 32 or 64 Mbytes. Although the primary memory is organised in bytes an instruction or data item may use several consecutive bytes of storage, eg using 2, 4 or 8 bytes to store 16-bit, 32-bit or 64-bit values respectively.
The size of the data bus determines the number of bits which can be transferred between system components in a single read or write operation. This has a major impact on overall system performance, ie a 32-bit value can be accessed with a single memory read operation on a 32-bit bus but requires two memory reads with a 16-bit bus. In practice the more powerful the processor the larger the data and address busses.
The size of the address and data busses has a major impact on the overall cost of a system, ie the larger the bus the more complex the interface circuits and the more 'wires' interconnecting system components. Table 2 shows that there are versions of some processors with a smaller data and addresses busses, eg the Intel 80386SX is (from a programmers viewpoint) internally identically to the 80386 but has a 20-bit address bus and a 16-bit external data bus (but the internal data bus is 32-bits). These are used to build low cost systems which are able to run application programs written for the full processors (but with reduced performance).
The 80486DX2 and 80486DX4 have on-chip clock multipliers which
multiply the clock by *2 and *3 respectively, ie on-chip operations
are performed at two or three times the external clock speed making
a particular improvement in processor bound jobs. In addition,
the DX4 has a large cache (hence 4 rather than 3). This has little
effect on I/O bound jobs (eg a database server or a file server)
where a Pentium with a 64-bit bus would be used. Table 2a shows
the Intel processors with address, data bus sizes (internal and
external), internal cache size, presence of internal co-processor
and internal clock speed.
|
IBM PC compatibles processor model |
address bus size in bits | maximum memory bytes | internal data bus in bits | external data bus in bits | internal cache in bytes | internal co-processor | internal clock |
|
Intel 8088 (IBM/PC) Intel 8086 (IBM/PC XT) Intel 80186, 80286 Intel 80386SX Intel 80386DX Intel 80486DX Intel 80486SX Intel 80486DX2 Intel 80486DX4 Intel Pentium |
20 20 24 32 24 32 32 32 32 32 |
1M 1M 16M 4G 16M 4G 4G 4G 4G 4G |
16 16 16 32 32 32 32 32 32 64 |
8 16 16 32 16 32 32 32 32 64 |
none none none none none 8K 8K 8K 16K 16K |
no no no no no yes no yes yes yes |
*1 *1 *1 *1 *1 *1 *1 *2 *2 or*3 *1 |
Notes:
Address bus size
determines the memory address space of a processor, eg 32 address lines can address a maximum of 4Gbyte of memory
Data bus size
determines how many memory read/write cycles are required to access instructions/data has a major effect of input/output bandwidth (important in file servers and database servers)
Cache memory
a fast memory logically positioned between the processor and bus/main memory - can be on chip (as in 80486) and/or external
Floating point co-processor
is important in real number calculations (twenty times speed up over normal CPU)
important in mathematical, scientific and engineering applications
Clock Speed
The clock times events within the computer - the higher the clock the faster the system goes - (assuming memory, bus, etc. matches the speed)
Internal clock speed
the DX2 and DX4 processor contain clock doublers/triplers
on-chip operations are performed at 2/3 times a DX machine - external operations are the same
