Computer Architecture

To learn and understand any aspect of computer science, you first need to understand how computers work. Understanding the architecture of a computer will help you achieve this.

Computer architecture is a set of rules that describe the organisation and implementation of a computer. Before we dive into what this means, we first need to take a look at the basics.

So a computer is a device that carries out instructions. It can perform arithmetic and logical operations. There are different parts to a computer. First there is the brains of the computer which performs computations. This is called a CPU or a processor. CPU stands for Central Processing Unit. The CPU needs somewhere to hold data when it is performing computations and this data needs to be able to be retrieved and stored quickly. It does this using a register. This is just a fast temporary holding place for data. Most of the information in a computer is stored in secondary storage. This means that it is non-volatile (the information remains there even when the computer is shut off) but it can also be slower to retrieve. So when we want to run a program on a computer, it is loaded into main memory. To read more about this, check out my article on processes. Main memory is an example of primary storage. It is volatile (anything stored here will be lost when the computer shuts off) and fast. Registers are also volatile but are even faster than main memory.

Computers will connect to I/O (Input/Output) devices such as your keyboard, speakers, printer, mouse etc. so that they can process your requests such as clicking on an icon or typing a word on the keyboard. The I/O devices communicate with the CPU using a bus. This is just a set of wires that connects devices to the CPU and main memory so that data can be transferred around the system.

In order to get a full understanding of that diagram, I would really suggest reading the first half of this article.

RISC VS. CISC

Computers can be implemented in different ways. Two of these ways are Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing (RISC). With CISC, single instructions can execute low level instructions such as arithmetic operations but also more complex multi-step instructions. Contrastingly, RISC has a small set of simple instructions instead of the complex and specialised instructions you can get with CISC.

So the main difference between these two lies on the balance of complexity vs. simplicity.

CISC was developed in response to memory constraints, as with less memory, code became more dense and tightly packed. With CISC, a single instruction could implement a high-level operation. However, as memory constraints lessened, code didn’t need to be as dense anymore. A high-level instruction could be implemented by a series of simple instructions. And so, RISC is more common nowadays.

To get a full understanding of this, let’s use an example to explore the difference in the CISC and RISC approach. So, say we want to multiply two numbers on a computer, a simple task right?

CISC Approach

MULT 6:3 3:2

With the CISC approach we do this using a single complex instruction. This instruction finds the memory location of the numbers we are multiplying (6:3 and 3:2). It will then load these numbers into registers. It will multiply the two numbers and then store the result in another memory location (say, 6:3).

RISC Approach

LOAD A, 3:2
LOAD B, 6:3
MULT A, B
STORE 3:2, A

We this approach, we perform the same computation but with multiple simple instructions. We have one instruction to load the number in memory address 3:2 into a register called A. We then have another instruction to load a number in the memory location 6:3 into register B. We then multiply the contents of these two registers together and this will be stored in A. Then we move the contents of A back into 3:2. So the memory required for this is four times larger.

With RISC, computations are only performed on registers. The only memory operations are LOAD and STORE.

Advantages of RISC

The advantages of RISC are that there is more complexity from hardware to software. The hardware should not contain much complexities and should be relatively simple, with the complexities being handled by the software and so RISC achieves this as there are small simple instructions. A compiler will translate instructions written in a programming language into machine code. RISC promotes good compilers as they will have a lot of simple sets of instructions to compile instead of a few very dense ones and good compliers will make better use of registers. A RISC computer will have less transistors. This means the computer will generate less heat, are smaller, and have a reduced power consumption. This benefit should not be underestimated. The RISC computations can also be faster and the chips required are less expensive.

Disadvantages of RISC

As there are more simple instructions, the CPU may require more complexity to execute them.

Advantages of CISC

The number of instructions required to implement a program can be reduced by creating instruction sets and can also use main memory more efficiently.

Disadvantage of CISC

The performance of the machine may slow down as different instructions will take different amounts of time to execute. The instruction set complexity continually increases.

Measuring Performance

How do we measure the performance of computers? This answer really depends on what you’re measuring and how you’re measuring it.

If you want to measure how long it takes to get work done in a computer, then wall clock time is probably a good measure. This includes user CPU time, system CPU time, interrupt handling time, and I/O time. In computing, a job is a unit of work.

However, if we want to compare processors it may be better to measure the number of clock cycles necessary to complete a task. A clock cycle is a signal that oscillates between a high and low state. The clock signal is used to synchronise different parts of a circuit. The clock rate is the number of clock cycles per unit time. CPU clock cycles are the number of clock cycles needed to complete a job.

So to measure CPU time we can use the following formula:

CPU Time = CPU Clock Cycles / Clock Rate

The problem with this approach is that it is too dependant on a single job. It would be better to derive a metric which is independent of an individual job. One such metric could be Cycles Per Instruction (CPI). We need another metric for this, Instructions per Cycle (IC). This is the number of instructions required to complete a job.

To calculate the Cycles Per Instruction, we use:

CPI = CPU clock cycles / IC

And also, if we play around with the formulas, you will find that:

CPU time = (IC * CPI ) / Clock Rate

This means that if we want to make things go faster by reducing the CPU time then we can:

Reduce the Instruction Count
Reduce the Cycles Per Instruction
Increase the Clock Rate

Other metrics include:

Millions of Instructions Per Second (MIPS) is another metric. It is useful for comparing processors which use a similar architecture although it is more a measure of task performance speed compared to some reference as the metric breaks down when the processors follow different architectures. It is calculated by:

MIPS = Clock Rate / (CPI * 10^6)

Floating Point Operations Per Second (MFLOPS) is another measure of computer performance and it is said to be more accurate than Instructions Per Second. A floating point is a number that is in its most basic form can be thought of as a decimal eg. 3.14. It is calculated by:

MFLOPS = Clock Rate / (Cycles per Floating Point Instructions * 10^6)

MFLOPS and MIPS are not great measurements of performance as they assume certain environments such as MIPS assuming all CPU’s are following the same architecture.

Improving Performance

One thing that is important to understand is that optimisation is expensive as it requires a lot of investment. So if we are going to invest in an optimisation, we need to know that we will get good returns. One way to check this is through Speedup.

Speedup is calculated by:

SpeedUp = Performance without enhancement / Performance with enhancement

Speedup is used when we want to ‘speed something up’. We want to cut down the time it takes to do something. So when we calculate the speed up as a result of the performance, the lower the result the better. Usually only a portion of a job will be sped up by a single enhancement, so when we consider this we may find a slight problem with the formula above. And so, in comes Amdahl’s Law. Amdahl’s law tells us that we want to make the common case very fast, this means that the component which affects the job the most should be sped up more if possible.

Amdahl’s Law states that:

Speedup = 1 / ( (1-P) + (P / S))

where P is the proportion of the job affected by the enhancement and S is the speedup associated with just P. The Speedup calculated here is the speedup associated with the whole job. If you would like to see this proven and derived, you can look here.

Failures

A system can be in one of two states; functioning or non-functioning. Transitions between these states are failures and restorations.

The Mean Time To Failure (MTTF) is a metric that can be used to predict the expected time until failure of a system. This calculation is used for systems that cannot be repaired. So once they become non-functioning, they can only be replaced. It is also used for situations that have a constant failure rate.

The calculation for this is:

MTTF = Hours of Operation / Number of Failures

Failure Rate is the frequency at which a component of a system fails. It is the reciprocal of Mean Time To Failure. Therefore:

Failure Rate = Number of Failures / Hours of Operation

The failure rate is often measured in failures per a billion hours. This is known as Failures In Time (FIT).

Restorations

When a system fails, it must then be restored.

Mean Time To Repair (MTTR) is a prediction of the average time it will take to repair a system.

MTTR = maintenance time / number of maintenance actions

Mean Time Between Failures (MTBF) is a measure of the time between failures.

MTBF = MTTR + MTTF

Availability is the proportion of time during which a service is successfully delivered.

Availability = MTTF / (MTTF + MTTR)

Systems

Computer systems consist of a large number of components such as the processors, memory, bus etc. If you have ever heard the saying that

“What is true of the parts is true of the whole”

then you will understand this easily. If all these components of a computer system have exponentially distributed lifetimes, then the system as a whole must do too.

Improving Reliability

We need our systems to be reliable so that we can have confidence in them. We need to know as users that our computers are not going to keep failing. A method to improve reliability is redundancy. This means duplicating components of the system as the duplicate can act as a back-up if one component fails.

Instruction Set Architecture

The functionality of a processor is defined by its Instruction Set Architecture (ISA). This is an abstract model of the computer system. Many physical designs are possible from an ISA. The ISA acts as an interface between the hardware and the software.

ISA acts as interface between hardware and software

There are four different approaches to an ISA:

Stack Architecture
Accumulator
General-Purpose Register Architectures:
- Register-Memory
- Memory-Memory

Stack Architectures

If you don’t know what a stack is, it is explained briefly in this article. With the stack architecture, operands for all ALU instructions are on the top of the stack. The operand is the object of a mathematically expression. For example, in the expression 3 + 8, both 8 and 3 are the operands while + is the operator.

PUSH A
PUSH B
Add
POP C

So in a stack architecture, the only memory operations are push and pop. The code above pushes the contents of register A and B onto the stack. It then adds them together and pops off the result into register C.

The code for stack architectures is quite compact. There is specialised addressing for parameters and local variables and the compilers are easy to write due to the compactness. This architecture requires more instructions than a register machine. It is also slower than general-purpose register architectures.

Accumulator Architectures

There is one operand for all ALU instructions is in the accumulator register.

Load A
Add B
Store C

In this code, we load the contents of register A and add it to B, then Store it in C.

Both Stack and Accumulator Architectures are no longer used as they generate too much traffic and make hardware optimisation difficult.

General Purpose Register Architectures

These have more registers and they have a uniform model for compilers. This makes it possible to have more compiler optimisations.

Register-Memory Architecture

CISC is an example of this architecture. There is memory addressing for all or many ALU instructions and there are complex addressing modes. An addressing mode determines how to calculate the memory address of an operand by using registers in the machine instruction.

Load R1, A
Add R3, R1, B
Store R3, C

This loads A into register R1, then adds R1 to B and stores it in register R3. The contents of R3 are then stored to C.

Register-Register Architecture

In this architecture, all ALU instructions are register-register. Additionally, all memory operations are Load/Store. There are also fewer addressing modes.

Load R1, A
Load R2, B
Add R3, R2, R1
Store R3, C

This code loads A into register R1 and does the same with B into R2. Then the contents of R1 and R2 and added together and stored in R3. The contents of R3 and then stored in C.

Instruction Cycle

This is a slight aside but its important to know before we continue. The instruction cycle is a cycle followed by the CPU in order to process instructions. It is split up into a series of stages.

First is the Fetch Stage. Here, the next instruction is fetched from the memory address held in the program counter (this holds the memory address of the next instruction to be executed, more in this article). This is then stored in an instruction register which is simply a register that holds the next instruction to be executed. The program counter will then point to the next instruction to be executed.

Second is the Decode Stage. The instruction in the instruction register is now decoded by a decoder.

Third is the Execute Stage. The CPU will begin to execute the instruction.

They cycle is now repeated.

Instruction Size

There are two approaches to instruction size. These are variable-size instructions and fixed-size instructions.

Variable Sized Instructions

With variable sized instructions, additional bits are required to indicate instruction format. There are multiple steps to fetch and decode instructions. The strategy is usually to fetch the first part of the instruction and decode this. Then we fetch another bit and decode this part, and so on. This inhibits out of order execution optimisations. Short instructions will need to more compact code if they are chosen correctly and occur frequently.

Fixed Size Instructions

This has a simple and fast instruction cycle. Hardware optimisation is simplified. Although the instruction design is constrained. For example, the number of bits to represent displacement and immediate addresses is limited.

Memory Addressing

We talked briefly about main memory at the beginning and we said that is was volatile and that is was primary storage. An accurate description of memory is as a linear sequence of addressable bytes.

Memory Alignment

Memory addresses must be memory aligned. Alignment refers to the arrangement of data in memory. So a memory address is said to be n-aligned if that said memory address is divisible by n. There is a brief article here about it if you are interested.

When memory addresses are not aligned, a lot of additional costs can incur.

Addressing Modes

Register:

Add R1, R2

Here, the values that we need have already been loaded into registers.

Immediate:

Add R1, 3

We are now adding a constant (3) to the value stored in R3.

Displacement:

Add R2,100(R1)

Here, we are adding 100 to the address of R1 and whatever is in this memory location will be added to R2. The 100 here is known as the offset. So for example, say register R1 has an address of 5. With this line of code, we are adding the offest (100) to find and accessing the data stored in address 105.

Register Indirect:

Add R1, (R2)

We are accessing the contents of a register using a pointer. This can be a special type of displacement where displacement is zero.

Indexed:

Add R1, (R2 + R3)

We are adding the address of R2 and R3 together and then using the contents of this address to add to R1. This is often used for indexing an array.

Direct:

Add R1, (3)

This directly adds the contents of address 3 to register R1. This may be used to access static data.

Memory Indirect:

Add R1, @ (R2)

So here, the address of the data item that we want to add to R1 is not stored in R2. We need to use the contents of R2 to access another address which will contain the address of the data item we need to retrieve.

Autoincrement:

Add R1, 100(R2)

This mode will increment or decrement the address after use. This can be used to step through arrays.

Scaled:

Add R1, 100(R2)[R3]

The address of R2 + 100 is the base address. R3 may contain the index for this base address. The index determined by R3 is then added to R1.

Hazards

There are different types of hazards that may effect the execution of instructions.

Structural Hazards: Occurs when the hardware cannot support all instruction combinations at the same time.

Data Hazards: This occurs when one instruction depends upon the result of a previous instruction which is not yet available.

Control Hazard: When the address of the next instruction cannot be determined immediately. This can occur when the processor does not yet know the next instruction that will need to be executed after a branch. A branch is an instruction that can change the sequence of instructions to be executed.

Pipeline

The pipeline is a set of instructions that are connected in a series. Instructions in a pipeline may be executed in parallel.

Stalls

Stalls can occur in a system as a result of these hazards.

A Read After Write (RAW) Stall and it occurs as a result of a data hazard, when an instruction is trying to access a data item that is currently not available.

A Write After Write (WAW) Stall can occur when an instruction tries to change a value of a data item that another instruction is currently processing.

A Pipeline Stall occurs if all other instructions are also blocked as a result of a stall.

Conclusion

Today we have learnt about the different computer approaches such as RISC and CISC and we have examined architectures such as stack and accumulator. We have also looked at the Instruction Cycle and the different addressing modes. Finally, we looked at the different hazards that can occur in relation to the execution of instructions as well as the stalls that can result from them.