|
|
Faculty of Computing and Engineering Sciences De Montfort University, Leicester, UK |
2 Performance requirements due to system and application software
The feasibility study to generate system requirements not only in terms of software (to solve the end-users problems) but also hardware to support that software. The hardware requirements will be in terms of computer processor power (do you need a £1000 office PC or a £20000 professional workstation with real-time 3D graphics capability?), memory size (do you need an 32Mbytes or 256Mbytes of RAM), disk space (even individual PC based packages often need a 1Gbyte each), network support (to communicate with servers or other users), etc. In addition, many end-users often forget the requirements of the system software (operating system, compilers, etc.). These notes consider hardware requirements to support software and discuss what factors effect overall system performance.
The WWW Virtual Library on computing - http://src.doc.ic.ac.uk/bySubject/Computing/Overview.html
CPU Information centre - http://bwrc.eecs.berkeley.edu/CIC/
Intel's developer site - http://developer.intel.com/
Intel PC technology discussion - http://developer.intel.com/technology/
PC reference information - http://www.pcguide.com/index.htm
IBM PC compatible FAQ - http://www.undcom.com/compfaq.html
History of CPUs - http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_history.html
CPU Information & System Performance Summary
- http://bwrc.eecs.berkeley.edu/CIC/summary/
Chronology of Events in the History of Microcomputers
- http://www.islandnet.com/~kpolsson/comphist/
Fig 1 Typical microcomputer configuration using a common bus system
Fig 1 is a representation of the hardware (physical components) of a simple single processor computer system comprising:
A sophisticated system may be much more complex than Fig. 1 with multiple processors, cache memories (see below), separate bus systems for main memory, fast and slow I/O devices, etc.
When attempting to estimate the requirements of a proposed system in terms of processor performance, main memory and disk size, etc., attention must be paid to the needs of both system and user software in terms of:
Virtual memory makes use of a phenomenon known as locality of reference in which memory references of both instructions and data tend to cluster. Over short periods of time a significant amount of:
(a) instruction execution is localized either within loops or heavily used subroutines, andMost virtual memory systems use a technique called paging in which the program and data is broken down into 'pages' (typical size 4Kbytes) which are held on disk. Pages are then brought into main memory as required and 'swapped' out when main memory is full. This technique allows program size to be much larger than the physical main memory size (typically a modern professional workstation may have 64 to 512Mbytes of main memory but a virtual memory size of 4Gbyte). As the number and/or size of concurrent programs increases a phenomenon known a thrashing can occur in which the system spends all its time swapping pages to and from disk and doing nothing else. It is therefore important to configure sufficient physical memory even under a virtual memory environment. This problem often becomes apparent over a period of time as new releases of software (including the operating system) are mounted on a system. New versions of software are always larger (sometimes two or three times) and users experience a sudden reduction in response times and extended program run times. This often necessitates the upgrading of main memory on existing systems every year or two.(b) data manipulation is on local variables or upon tables or arrays of information.
| Windows 3.1
Windows 95 Windows 98 Windows NT/2000 LINUX SCO UNIX |
minimum 4Mbytes preferred 8Mbytes
minimum 16Mbytes preferred 32Mbytes minimum 32Mbytes preferred 64/128Mbytes minimum 64Mbytes preferred 128/256Mbytes minimum 16 Mbytes preferred 64/128Mbytes minimum 64Mbytes preferred 256Mbytes |
If the main memory is too small there will be insufficient space for
user programs and data or, in a multiprogramming/virtual memory environment,
excessive swapping and paging between main memory and disk will occur.
| MS-DOS 6.2 | 5.8 Mbytes | |
| plus | CD-ROM driver | 6.9 Mbytes |
| plus | Windows 3.1 | 16.3 Mbytes |
| plus | Win32S | 18.5 Mbytes |
| plus | Windows 95 | 41 Mbytes |
One would then need to allow another 20 to 200Mbyes for swap space (depending
upon application). Other examples of PC operating system requirements are:
| OS/2 | 40 Mbytes plus swap space |
| Windows 98 | 100/150Mbytes plus swap space |
| Windows NT/2000 | 200/300Mbytes plus swap space |
| LINUX (a free PC version of UNIX) | 200 Mbytes plus swap space |
| LINUX plus X-windows | 350 Mbytes plus swap space |
Some operating systems (e.g. certain versions of Linux) require swap space to be allocated when the disk is initialized (by setting up a swap partition). Others (e.g. Windows 95/98) have a swap file which extends and contracts as required (will cause problems if the disk fills up!)
Processor dependent:
the performance of applications in this category is largely dependent on instruction execution speed and the performance of the ALU (arithmetic/logic unit used to manipulate integer data), e.g. AI (artificial intelligence) applications are a very good example (Lisp and Prolog programs, simulating neural networks, etc.).Floating point dependent:
many mathematical/scientific applications will require a good real number calculation performance, e.g. the analysis of the structure of a bridge using finite element mesh techniques.I/O (input/output) dependent applications:
applications which extensively manipulate disk file based information will require a good I/O bandwidth, e.g. a large database holding details of clients orders which may be simultaneously accessed by staff in various departments (production, sales, accounting, etc.)In practice one the above factors may predominate in a particular application (e.g. I/O bandwidth is critical in database applications) or a broader overall system performance may be required.
Sufficient main memory and disk space must be provided to support the
executable code and user data sets. Examples of IBM PC compatible software
disk requirements are:
| Wordstar 7 | 6 Mbytes minimum, 17 Mbytes maximum |
| Turbo C++ 3.1 | 8.5 Mbytes typical |
| Borland C++ 5 | 170 Mbytes typical (depends on libraries installed) |
| Visual C++ 2 | 68 Mbytes minimum, 104 Mbytes typical |
| Oracle | running under SCO UNIX may require 256Mbytes of RAM to support a sophisticated database system. |
| Java JDK1.2.2 | 150Mbytes plus more for extra APIs |
| Viewlogic CAD | 800/1000 Mbytes |
It is worth noting that although Java is not particularly large in disk requirements it needs powerful processors and lots of memory to run complex Java applications using sophisticated APIs, e.g. minimum Pentium 400 with 64/128Mbytes of memory. In a recent experiment Sun's Java IDE Forte was mounted on a 5 year old DEC Alpha with 64Mbytes of memory and took 15 minutes to load!
Generally software houses or package sales documentation will provide guidance on processor and memory requirements, e.g. so much memory and disk space for the base system plus so much per user giving an immediate guide to the size of system required (one then needs to add operating system requirements).
(see next sub-section for a detailed discussion)Main or primary memory size
determines instruction execution speed, arithmetic performance, interrupt handling capability, etc.
is critical in system performance. In a single user system it limits the size of program and/or data set which can be processed. In a multiprogramming/virtual memory environment it effects the number of concurrent processes which can be held without swapping to and from disk. In general the more powerful the processor the larger the memory, i.e. a general rule is that as processor power increases so does the user requirements and this leads to larger and more complex programs. When determining the main memory size required for a system allowance must be made for the operating system, e.g. a sophisticated operating system such as UNIX or Windows 98 typically requires 8 to 32Mbyte for resident components and work area.Secondary memory (disk) size
determines the number of programs and data sets which can be accessed on-line at any instant. For example, in a single user word processing environment only one or two documents will be accessed at a time, which could be held on a small floppy disk. On the other hand, a large multi-user minicomputer could have 50 simultaneous users running large programs with large data sets requiring 10000Mbytes or more of disk space. Again when estimating disk requirements allowance has the to made for the operating system, e.g. UNIX typically requires of the order of a 300Mbytes if all utilities and help files are on-line.Input/output bandwidth
is a measure of how fast information can be transferred between the processor, memory and I/O devices (see data bus size in next sub-section).Network capability
is important in a distributed environment where a number of separate systems are connected via a network, e.g. personal workstations accessing a shared central database.
Over the past twenty years more and more instructions have been added making the microprogram of typical CISC computers (e.g. Intel 8086 and Motorola 68000 families) very complex and difficult to debug (see the discussion on CISC and RISC machines below).
See CPU Information & System Performance Summary - http://bwrc.eecs.berkeley.edu/CIC/summary/ and CPU Information centre - http://bwrc.eecs.berkeley.edu/CIC/
|
|
|
|
|
|
|
|
|
10 12.5 16.65 25 33 50 |
|
0.8 1.3 |
0.8 1.1 |
2.2 3.0 6.0 |
5.0
12.0 |
22.0
|
Table 1 Relative performance (in Mips) of the Motorola MC68000 family
against clock speed
(figures are a guide - results depend on clock speed, memory access
time, cache hit rate, etc.)
The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock
multipliers which typically multiply the clock by two, three or four times,
i.e. on-chip operations are performed at two, three or four times the external
clock speed making a particular improvement in processor bound jobs. This
has little effect on I/O bound jobs (e.g. a database server or a file server)
where a large data bus and fast I/O devices are more important.
It must be noted that even though a processor has a particular address
space this does not mean that a computer system will be or can be fitted
with the maximum amount. For example, a processor with 32 address lines
has an address space of 4Gbyte but typical 32-bit machines are fitted with
anything between 4Mbyte and 256Mbyte of physical memory. The 4Gbyte address
space becomes important under a virtual memory environment where very large
programs can be executed on machines with much smaller physical memory.
In practice there is a maximum amount of memory which can be fitted to
a particular model of machine (determined by the layout of the machine
in terms of bus slots, physical space available, etc.). One of the major
differences between personal workstations and mini/mainframe computer systems
is that the latter can generally be fitted with much larger physical memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The figures are order of magnitude guides but do give an indication
of different areas of application of the systems. The Apollo was a single
user workstation used for highly interactive computational tasks and the
VAX was typically be used by a number of concurrent users (e.g. five to
ten) to run tasks which are not heavy in computational terms but which
require a system capable of supporting the I/O of a number of users (e.g.
multi-user databases, sales/stock control packages, accounting packages,
etc.)
| Microprocessor manufacturer & type | address bus size in bits | maximum memory bytes | data bus size in bits | clock |
| Intel 8080
Zilog Z80 Motorola 6800 Intel 8088 (IBM/PC) Intel 8086 (IBM/PC XT) Motorola 68008 Motorola 68000, 68010 Intel 80186, 80286 Motorola 68020/30/40 Intel 80386SX Intel 80386DX Intel 80486DX Intel 80486SX Intel 80486DX2 Intel 80486DX4 Intel Pentium 400 |
16
16 16 20 20 20 24 24 32 24 32 32 32 32 32 32 |
64K
64K 64K 1M 1M 1M 16M 16M 4G 16M 4G 4G 4G 4G 4G 4G |
8
8 8 8 16 8 16 16 32 16 32 32 32 32 32 32/64 PCI |
*1
|
Table 2 Common microprocessors with address and data bus sizes
Note: K = 1024 (210), M = 1048576 (220), G = 1073741824 (230) The 40486SX is identical to the DX except that it has no floating point coprocessor
Table 2 shows address and data bus sizes for various microprocessors:
The size of the data bus determines the number of bits which can be transferred between system components in a single read or write operation. This has a major impact on overall system performance, i.e. a 32-bit value can be accessed with a single memory read operation on a 32-bit bus but requires two memory reads with a 16-bit bus. In practice the more powerful the processor the larger the data and address busses.
The size of the address and data busses has a major impact on the overall cost of a system, i.e. the larger the bus the more complex the interface circuits and the more 'wires' interconnecting system components. Table 2 shows that there are versions of some processors with a smaller data and addresses busses, e.g. the Intel 80386SX is (from a programmers viewpoint) internally identically to the 80386 but has a 20-bit address bus and a 16-bit external data bus (but the internal data bus is 32-bits). These are used to build low cost systems which are able to run application programs written for the full processors (but with reduced performance).
The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at two, three or four times the external clock speed making a particular improvement in processor bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where a large data bus and fast I/O devices are more important.
Table 2a shows the Intel processors with address, data bus sizes
(internal and external), internal cache size, presence of internal co-processor
and internal clock speed.
| IBM PC compatibles
processor model |
address bus size in bits | maximum memory bytes | internal data bus in bits | external data bus in bits | internal cache in bytes | internal co-processor | internal clock |
| Intel 8088 (IBM/PC)
Intel 8086 (IBM/PC XT) Intel 80186, 80286 Intel 80386SX Intel 80386DX Intel 80486DX Intel 80486SX Intel 80486DX2 Intel 80486DX4 Intel Pentium 400 |
20
20 24 32 24 32 32 32 32 32 |
1M
1M 16M 4G 16M 4G 4G 4G 4G 4G |
16
16 16 32 32 32 32 32 32 64 |
8
16 16 32 16 32 32 32 32 32/64 PCI |
none
none none none none 8K 8K 8K 16K 16K |
no
no no no no yes no yes yes yes |
*1
*1 *1 *1 *1 *1 *1 *2 *2 or*3 *4 |
Table 2a Intel processors
Notes:
Address bus size
determines the memory address space of a processor, e.g. 32 address lines can address a maximum of 4Gbyte of memoryData bus size
determines how many memory read/write cycles are required to access instructions/data has a major effect of input/output bandwidth (important in file servers and database servers)Cache memory
a fast memory logically positioned between the processor and bus/main memory - can be on chip (as in 80486) and/or externalFloating point co-processor
is important in real number calculations (twenty times speed up over normal CPU)Clock Speed
important in mathematical, scientific and engineering applications
The clock times events within the computer - the higher the clock the faster the system goes - (assuming memory, bus, etc. matches the speed)Internal clock speed
the 80486DX2, 80486DX4 and Pentium processors contain clock doublers/triplers/quadrouplers, etc.
on-chip operations are performed at 2/3/4 times the external clock speed - external operations are the same
Fetch Cycle
A machine code instruction is fetched from main memory and moved into the Instruction Register, where it is decoded.Execute Cycle
The instruction is executed, e.g. data is transferred from main memory and processed by the ALU.To speed up the overall operation of the CPU modern microprocessors employ instruction prefetch or pipelining which overlap the execution of one instruction with the fetch of the next or following instructions. For example, the MC68000 uses a two-word (each 16-bits) prefetch mechanism comprising the IR (Instruction Register) and a one word prefetch queue. When execution of an instruction begins, the machine code operation word and the word following are fetched into the instruction register and one word prefetch queue respectively. In the case of a multi-word instruction, as each additional word of the instruction is used, a fetch is made to replace it. Thus while execution of an instruction is in progress the next instruction is in the prefetch queue and is immediately available for decoding. Powerful processors make extensive use of pipelining techniques in which extended sequences of instructions are prefetched with the decoding, addressing calculation, operand fetch and execution of instructions being performed in parallel (Stallings 2000). In addition, modern processors cater for the pipelining problems associated with conditional branch instructions. For more details see http://www.cs.herts.ac.uk/~comrrdp/pipeline/pipetop.html and http://www.cs.umass.edu/~weems/CmpSci535/535lecture8.html
A cache memory makes use of the locality of reference phenomenon already discussed in the section on virtual memory, i.e. over short periods of time references of both instructions and data tend to cluster. The cache is a fast memory (matched to CPU speed), typically between 4K and 256Kbytes in size, which is logically positioned between the processor and bus/main memory. When the CPU requires a word (instruction or data) a check is made to see if it is in the cache and if so it is delivered to the CPU. If it is not in the cache a block of main memory is fetched into the cache and it is likely that future memory references will be to other words in the block (typically a hit ratio of 75% or better can be achieved). Clearly memory writes have to be catered for and the replacement of blocks when new block is to be read in. Modern microprocessors (Intel 80486 and Motorola MC68040) have separate on-chip instruction and data cache memories - additional external caches may also be used, see Fig 2. Cache memory is particularly important in RISC machines where the one instruction execution per cycle makes heavy demands on main memory.
The concept of a cache has been extended to disk I/O. When a program requests a block or blocks several more are read into the cache where it is immediately available for future disk access requests. Disk caches may take two forms:
Software disk cache
in which the operating system or disk driver maintain the cache in main memory, i.e. using the main CPU of the system to carry out the caching operations.Hardware disk cache
in which the disk interface contains its own cache RAM memory (typically 4 to 16Mbytes) and control circuits, i.e. the disk cache is independent of the main CPU.Hardware disk caches are more effective but require a more complex (and expensive) disk controller and tend to be used with fast disks in I/O bound applications, e.g. databases.
Fig 2 Showing CPU (with ALU, Control Unit and internal cache), external cache, RAM memory and busses
MC68000 - 1979
NMOS technology approximately 68000 transistors. 16-bit data bus, 24-bit address bus (maximum 16 Mbyte memory)MC68008 - 1982
2 word prefetch queue (including IR)
approximately 0.6 Mips at 8MHz
NMOS technology - from a programmers viewpoint almost identical to 68000MC68010 - 1982
8-bit data bus, 20 bit address bus (maximum 1Mbyte memory)
approximately 0.5 Mips at 8MHz
as 68000 with the following enhancements:MC68020 - 1984
three word prefetch queue (tightly looped software runs in 'loop mode')
memory management support (for virtual memory)
approximately 0.65 Mips at 8MHz
CMOS technology with 200000 transistorsMC68030 - 1987
true 32-bit processor with 32-bit data and address busses (4 Gbyte address space)
extra instructions and addressing modes
three clock bus cycles (68000 bus cycles take four clock cycles)
extended instruction pipeline on-chip 256 byte instruction cache co-processor interface, e.g. for MC68881 floating-point co-processor
approximately 2.2 Mips at 16MHz
300000 transistorsMC68040 - 1989
extended pipelining
256 byte on-chip instruction cache and 256 byte on-chip data cache
on-chip memory management unit
approximately 5.0 Mips at 16MHz
1200000 transistors
4Kbyte on-chip instruction cache and 4Kbyte on-chip data cache
on-chip memory management unit and floating point processor
pipelined integer and floating point execution units operating concurrently
approximately 22.0 Mips at 25MHz
![]()
Fig 3 Showing the relative performance of Intel processors - from http://bwrc.eecs.berkeley.edu/CIC/summary/icomp.gif
| Statement | SAL | XPL | Fortran | C | Pascal | Average |
| Assignment
IF CALL LOOP GOTO other |
47
17 25 6 0 5 |
55
17 17 5 1 5 |
51
10 5 9 9 16 |
38
43 12 3 3 1 |
45
29 15 5 0 6 |
47
23 15 6 3 7 |
Table 3 Percentage of statement types in five programming languages (Tanenbaum 1990)
An alternative approach to processor architecture was evolved called the reduced instruction set computer or RISC. The number of instructions was reduced by an order of magnitude and the space created used for more processor registers (a CISC machine typically has 20 registers a RISC machine 500) and large on-chip cache memories. All data manipulation is carried out on and using data stored in registers within the processor, only LOAD and STORE instructions move data between main memory and registers (RISC machines do not allow direct manipulation upon data in main memory). There are a number of advantages to this approach:
| CPU | Transistors | Design
(person-months) |
Layout
(person-months) |
| RISC I
RISC II MC68000 Z8000 Intel APx-432 |
44,000
41,000 68,000 18,000 110,000 |
15
18 100 60 170 |
12
12 70 70 90 |
Table 4 Design and layout effort for some microprocessors (Stallings 2000)
Floating point co-processor
to carry out real number calculations.Graphics processor
to control the graphics display. This can range from a fairly simple graphics controller chip which provides basic text, pixel and line drawing capabilities up to specialised processors which support advanced graphics standards such as X windows.Input/Output control processors
which carry out complex I/O tasks without the intervention of the CPU, e.g. network, disk, intelligent terminal I/O, etc. For example, consider a sophisticated network where the network communications and protocols are handled by a dedicated processor (sometimes the network processor and associated circuits is more powerful and complex than the main CPU of the system).In a 'simple' system all the above tasks would be carried out by sequences of instructions executed by the CPU. Implementing functions in specialised hardware has the following advantages which enhance overall system performance:
(a) the specialised hardware can execute functions much faster than the equivalent instruction sequence executed by the general purpose CPU; and
(b) it is often possible for the CPU to do other processing while a specialist processor is carrying out a function (at the request of the CPU), e.g. overlapping a floating point calculation with the execution of further instructions by the CPU (assuming the further instructions are not dependent upon the result of the floating point calculation).
One of the major limitations when increasing processor clock rate is the speed, approximately 20cm/nsec, at which the electrical signals travel around the system. Therefore to build a computer with 1nsec instruction timing, signals must travel less than 20cm to and from memory. Attempting to reducing signal path lengths by making systems very compact leads to cooling problems which require large mainframe and supercomputers to have complex cooling systems (often the downtime of such systems is not caused by failure of the computer but a fault in the cooling system). In addition, many of the latest 32-bit microprocessors have experienced over-heating problems. It therefore becomes harder and harder to make single processor systems go faster and an alternative is to have a number of slower CPUs working together. In general modern computer systems can be categorised as follows:
The MIMD (multiple-instruction multiple-data) architecture is one in which multiple processors autonomously execute different instructions on different data. For example:
Multi-processing
in which a set of processors (e.g. in a large mini or mainframe system) share common main memory and are under the integrated control of an operating system, e.g. the operating system would schedule different programs to execute on different processors.Parallel processing
in which a set of processors co-operatively work on one task in parallel. The executable code for such a system can either be generated by:(a) submitting 'normal' programs to a compiler which can recognize parallelism (if any) and generate the appropriate code for different processors;
(b) programmers working in a language which allows the specification of sequences of parallel operations (not easy - the majority of programmers have difficulty designing, implementing and debugging programs for a single processor computer).
Over the past 30 years, the performance/dollar ratio of computers has increased by a factor of over one million (Gelsinger et al 1989).
For example, in 1790 the cost of memory (magnetic core) was between 50 pence and £1 per byte, e.g. 4K of 12-bit PDP8 memory was approximately £4000. By the mid 1970's 16K of 32-bit PDP11 memory cost £4000. Today IBM PC compatible memory is between £25 and £40 per Mbyte.
The generations of integrated circuit technology range from small scale integration (SSI), to medium scale integration (MSI), to large scale integration (LSI), very large scale integration (VLSI) and ultra large scale (ULSI) integration. These can be represented by ranges of complexity (numbers of components on the chip), see Table 5.
Until recently a sophisticated workstation would have contained a large number of complex integrated circuit chips, e.g. the microprocessor, floating point co-processor, memory management unit, instruction and data caches, graphics controller, etc. As chip complexity increased it became possible to build more and more powerful on-chip microprocessors with larger and larger address and data busses. The major problem, however, with increasing off-chip bus widths is that every extra bit requires a contact (pin or leg) on the chip edge to connect it to the outside world and an extra 'wire' and interface components on the external bus and associated circuits. Thus every extra bus lines makes the overall system more complex and expensive, i.e. mini and mainframe computer systems (which have large data buses) can be an order of magnitude greater in cost than a personal workstation of equivalent CPU performance.
The ability to fabricate more components on a single chip (Fig. 6 and
Fig 6a) has meant that a number of functions can be integrated onto a single
integrated circuit, e.g. modern microprocessors contain the microprocessor,
floating point co-processor, memory management unit and instruction and
data caches on a single chip. The advantages of having the majority of
the major components on-chip that very wide internal busses can be used
decoupling cycle timing and bandwidth of on-chip operations from off-chip
considerations. Hence the processor can run at a very fast cycle time relative
to the frequency of the external circuitry. On-chip clock multipliers enhances
this effect.
| complexity | typical circuit function | |
| SSI
MSI LSI VLSI ULSI |
2-64
64-2000 2000-64000 64000-2000000 2000000-64000000 |
e.g. simple gates AND, OR, EXOR, NOT, etc.
e.g. counters, registers, adders, etc. e.g. ALUs, small microprocessors, I/O interfaces e.g. microprocessors, DMA controllers, etc. e.g. parallel processors, 1 Mbyte memory chips |
Table 5 Integrated circuit generations: complexity and typical circuit
function
Fig. 4 Maximum chip edge size against time
Fig. 5 Minimum feature size in microns against time
Fig. 6 Number of components per chip against time
Fig 6a CPU transistor count Intel 8086 family
Fig. 7 Average main memory cost per byte
Fig. 8 Trends in CPU performance growth (Hennessy & Jouppi
1991)
Note: no account is taken of other factors such as I/O bandwidth, memory
capacity, etc.
A microcomputer:
a single user computer system (cost £2000 to £5000) based on an 8-bit microprocessor (Intel 8080, Zilog Z80, Motorola 68000). These were used for small industrial (e.g. small control systems), office (e.g. word-processing, spreadsheets) and program development (e.g. schools, colleges) applications.A minicomputer:
a medium sized multi-user system (cost £20000 to £200000) used within a department or a laboratory. Typically it would support 4 to 16 concurrent users depending upon its size and area of application, e.g. CAD in a design office.A mainframe computer:
a large multi-user computer system (cost £500000 upwards) used as the central computer service of a large organization, e.g. Gas Board customer accounts. Large organizations could have several mainframe and minicomputer systems, possibly on different sites, linked by a communications network.As technology advanced the classifications have become blurred and modern microcomputers are as powerful as the minicomputers of ten years ago or the mainframes of twenty years ago.
Fig. 8 shows the rate of CPU performance growth since the 1960's (Hennessy & Jouppi 1991) as measured by a general purpose benchmark such as SPEC (these trends still continue - see Fig. 3). Microprocessor based systems have been increasing in performance by 1.5 to 2.5 times per year during the past six to seven years whereas mini and mainframe improvement is about 25% per year (Hennessy & Jouppi 1991). It must be emphasized that Fig. 8 only compares CPU performance and no account is taken of other factors such as the larger I/O bandwidth and memory capacity of mini and mainframe systems and the special applications which require supercomputers.
Today system configurations may be summarized as PCs (personal computers),
professional workstations, multi-user mini/mainframe computers and distributed
environments.
a generic term for a small (relatively) personal microcomputer system (cost £500 to £5000) used for a wide range of relatively low-level computer applications (see Table 6 for a summary of the features of a typical PC). The most common PCs are the IBM PC and compatible machines (based on the Intel 8086/80286/80386/80486/Pentium family of microprocessors).Bus size:
Until the late 1980's the major factor which limited the overall performance of IBM PC compatible computers was the widespread use of the 16 bit IBM PC/AT bus (the 16 bit refers to the data bus size) developed in the mid 1980s to support the 80286 based IBM PC/AT microcomputer. This bus system was widely accepted and became known as the ISA bus (Industry Standard Architecture). Unfortunately in terms of faster 80386/80486 computer systems the ISA bus was very slow, having a maximum I/O bandwidth of 8 Mbytes/sec. This caused a severe I/O bottleneck within 80486 systems when accessing disk controllers and video displays via the bus, see Fig 9.Some IBM PC compatibles were available with the IBM Microchannel bus or the EISA (Extended Industry Standard Architecture) bus, both of which are 32 bit bus systems having I/O bandwidths of 20 to 30 Mbytes/sec or greater. An EISA bus machine, however, could cost £500 to £1000 more than the equivalent ISA bus system with corresponding increases in the cost of the I/O boards (typically two to three times the cost of an equivalent ISA bus card). The EISA bus maintains compatibility with ISA enabling existing ISA cards to be used with it.
The problem with EISA was that it made the PC quite expensive and this led to the development of local busses which are cheaper and have similar or better performance. There were two major contenders:
Because VESA was the first to appear it became popular in the early/mid
eighties. Since that time PCI has taken over - mainly because it was supported
by Microsoft and Intel and could be use to support the Pentium which has
a 64-bit data bus (Intel quote peak bandwidths of 132Mbytes/sec). Early
Pentium systems had a PCI local bus used for high performance devices (video,
disk, etc.) plus an ISA bus for slower devices (serial and parallel I/O,
etc.), see Fig. 10. Many of todays Pentiums systems do not have ISA
bus slots which can cause problems if on wishes to interface with old devices,
e.g. specialist hardware boards.
Fig 9 Showing the ISA bus of an IBM PC compatible microcomputer
Fig 10 Showing IBM PC compatible microcomputer with a PCI local bus
PCI bus The original PCI bus was rated 32 bits at 33MHz giving a maximum throughput of 132Mbytes per second. Since then PCI-2 has appeared rated 32/64bits at 66MHz giving a maximum throughput of 528Mbytes persecond. Unfortunately the PCI bus is now quite dated and is becoming a performance bottleneck in modern Pentium systems - see http://www.intel.com/network/performance_brief/pc_bus.htm and http://www.pcguide.com/ref/mbsys/buses/func.htm for a discussion of PC busses.DisplayFor example, many Pentium motherboards are also equipped with a AGP (Accelerated Graphics Port) which was developed to support high performance graphics cards for 2D and 3D applications - see http://developer.intel.com/technology/agp/tutorial/, http://agpforum.org/ and http://www.pcguide.com/ref/mbsys/buses/types/agp.htm
The main problem with running sophisticated graphics applications on a PC is that the screen quality in terms of addressable pixels and physical size is deficient:
Operating system
The most common operating system of IBM PC compatibles is generally some variety of Windows (95/98/NT/2000). Although OK for many application environments UNIX is still preferred for high-performance robust application areas.
A modern high-performance PC, equipped with high-performance graphics card and high quality display can compete with low end workstations (at similar cost). More specialised applications such as real-time 3D graphics still require professional workstations.
- Computing power is an order of magnitude higher: the early machines were based on the Motorola MC68000 family of microprocessors, today the tendency is to use RISC based architectures. Main memory and disk size is corresponding higher.
- Bus system (Stallings 2000): in the past professional workstations used 32 bit bus systems, e.g. VME with an I/O bandwidth of 40 Mbytes/sec. Modern workstations have moving to 64 bit or greater buses or independent memory and I/O bus systems. .
- UNIX operating system: the de facto industry standard for medium sized computer systems.
- Integrated environment: the workstations are designed to operate in a sophisticated multiprogramming networked distributed environment. The operating system is integrated with the window manager, network file system, etc.
- Multiprogramming/virtual memory operating system: the workstations are designed to run large highly interactive computational tasks requiring a sophisticated environment.
- High quality display screen: a large high quality non-interlaced display with mouse input is used to interact with the window managed multiprogramming environment.
- User workstations: PCs and/or professional workstations running highly interactive software tools, e.g. word processing, spreadsheets, CAD design, CASE tools, etc.
- Fileservers: powerful (relative to the user workstations) computer system which holds central operating system files, user files, centralized databases, etc. This may be a high powered PC, a specialised workstation (without a user) or a minicomputer.
- Nodal processors: some commercial, industrial or scientific applications require a powerful centralized system to support specialised tasks beyond the capacity of the user workstations, e.g. heavy floating point numeric processing, very large databases, etc. Depending upon the configuration a fileserver may provide this support otherwise a dedicated minicomputer, mainframe or supercomputer would be used.
- Bridges and gateways: in a distributed environment care must be taken not to overload the network and to provide adequate security for sensitive information. Splitting a large network up into a number of semi independent networks linked by bridges and gateways can assist with these problems.
The main way to avoid the above problems is to provide users with no direct facility for copying files to/from movable media, i.e. PCs are not fitted with floppy disks. All disks and tapes brought into an organization are processed by the systems staff who check for illicit copying of files and any viruses.
Other avenues for illicit copying are via connections to external networks or by attaching portable computers to local networks. Rigorous procedures for controlling access to networks (e.g. extensive password protection) and the movement of portable and semi-portable machines can reduce these problems.
This paper reviewed a range of issues critical in system performance
evaluation:
Denning, P J, 1970, 'Virtual memory', ACM Computing Surveys, Vol. 2 No. 3, September.
Foster, C C, 1976, 'Computer Architecture', Van Nostrand Reinhold.
Gelsinger, PP, Gargini, P A, Parker, G H, & YU A Y C, 1989, 'Microprocessors circa 2000', IEEE Spectrum, Vol. 26 No. 10, October, pp 43-47.
Hennessy, J L & Jouppi, N P, 1991, 'Computer technology and architecture: an evolving interaction', IEEE Computer, Vol. 24, No. 9, September, pp 18-28.
Nelson, D L & Leach, P J, 1984, 'The Architecture and Applications of the Apollo Domain', IEEE CG&A, April, pp 58-66.
Pratt, V R, 1984, 'Standards and Performance Issues in the Workstations Market', IEEE CG&A, April, pp 71-76.
Stallings, W, 2000, 'Computer organization and architecture', Fifth Edition, Prentice Hall, ISBN 0-130085263-5.
Tanenbaum, A S, 1990, 'Structured Computer Organisation', Prentice-Hall.