CS220 Homework 4 Solution

CS 220 - Chapter 4 Study Problem Solutions

4.1.1a: RegMux refers to the mux that chooses between the ALU output and data memory output and then feeds the data input port on the registers. ALUMux refers to the mux between second read port of registers and goes into the second input of the ALU. For both multiplexors assume that the 0 output is at the top.
- RegWrite = 1
- MemRead = 0
- ALUMux = 0
- MemWrite = 0
- ALUOp = 10 (R-Type)
- RegMux = 1
- Branch = 0
4.1.1b:
- RegWrite = 0
- MemRead = 0
- ALUMux = 1
- MemWrite = 1
- ALUOp = 00 (SW)Add
- RegMux = X
- Branch = 0
4.1.2a: All of them perform a useful function except Data Memory and the branch Add ALU.
4.1.2b: All of them perform a useful function except the branch Add ALU and the write and data port on the registers.
4.1.3a:The branch adder output is not used and the data memory does not produce output.
4.1.3b: The branch adder output is not used. RegMux output is not used. Data memory does not produce otuput.
4.1.4: One long path for AND instruction is to read the instruction, read the reg- isters, go through the ALUMux, perform the ALU operation, and go through the Mux that controls the write data for Registers (I-Mem, Regs, Mux, ALU, and Mux). The other long path is similar, but goes through Control while registers are read (I-Mem, Control, Mux, ALU, Mux). There are other paths but they are shorter, such as the PC increment path (only Add and then Mux), the path to prevent branching (I-Mem, Control, Mux uses Branch signal to select the PC + 4 input as the new value for PC), the path that prevents a memory write (only I-Mem and then Control, etc).
For part A: Control is faster than registers, so the critical path is I-Mem, Regs, Mux, ALU, Mux.

For part B: The two long paths are equal, so both are critical.
4.1.5: One long path is to read the instruction, read registers, use the Mux to select the immediate as the second ALU input, use ALU (compute address), access D-Mem, and use the Mux to select that as register data input, so we have I-Mem, Regs, Mux, ALU, D-Mem, Mux. The other long path is similar, but goes through Control instead of Regs (to generate the control signal for the ALU MUX). Other paths are shorter, and are similar to shorter paths described for 4.1.4.
Part A: Control is faster than registers, so the critical path is I-Mem, Regs, Mux, ALU, D-Mem, Mux.

For part B: The two long paths are equal, so both are critical.
4.1.6: This instruction has two kinds of long paths, those that determine the branch condition and those that compute the new PC. To determine the branch condition, we read the instruction, read registers or use the Control unit, then use the ALU Mux and then the ALU to compare the two values, then use the Zero output of the ALU to control the Mux that selects the new PC.
To compute the PC, one path is to increment it by 4 (Add), add the offset (Add), and select that value as the new PC (Mux). The other path for computing the PC is to Read the instruction (to get the offset), use the branch Add unit and Mux. Both of the compute-PC paths are shorter than the critical path that determines the branch condition, because I-Mem is slower than the PC + 4 Add unit, and because ALU is slower than the branch Add.

For part A: The first path through the registers is longer.

For part B: The two long paths are equal, so both are critical.
4.12.1a: For the non-pipelined (single cycle) processor you need to add up the times for each unit, which is 1250ps. For the pipelined processor the cycle time is the time of the slowest unit (stage), in this case 350ps.
4.12.1b: pipelined is 220ps, non-pipelined is 950ps.
4.12.2a: The total latency for pipelined is $5(350)=1750$ps and the non-pipelined is 1250ps. The pipelined version is longer because each stage takes 350ps. Recall that each stage must have a delay that is the maximum of the delays of the stages.
4.12.2b:. Pipelined is $5(220) = 1100$ps and the non-piplined is 950ps.
4.13.1a: There is a dependency on $r4 from the second to the third instruction.
4.13.1b: There is a dependency on $r1 from first to second instruction and a dependency on $r2 from the second to the third instruction. There is also a dependency on $r1 between the first and third instruction.

4.13.2a: Since there is no forwarding then the add instruction has to wait until the end of the writeback stage in order to get the correct value in $4.

      sw $16,-100($6)   # lw   IF  ID  EX  MEM  WB  
      lw $4,8($16)      # add      IF  ID  EX   MEM  WB 
      nop               # nop          IF  ID   EX   MEM  WB
      nop               # nop              IF   ID   EX   MEM  WB
      nop               # nop                   IF   ID   EX   MEM  WB
      add $5,$4,$4      # sw                         IF   ID   EX   MEM  WB

The writeback of the lw instruction needs to finish before the decode stage of the add.

4.13.2b: Again, we need to make sure that the writeback stage of the first instruction completes before the decode stage of the second. Same for the second and third instructions.

      or $1,$2,$3   # or   IF  ID  EX  MEM  WB
      nop           # nop      IF  ID  EX   MEM  WB
      nop           # nop          IF  ID   EX   MEM  WB
      nop           # nop              IF   ID   EX   MEM  WB
      or $2,$1,$4   # or                    IF   ID   EX   MEM  WB
      nop           # nop                        IF   ID   EX   MEM  WB
      nop           # nop                             IF   ID   EX   MEM  WB
      nop           # nop                                  IF   ID   EX   MEM  WB
      or $1,$1,$2   # or                                        IF   ID   EX   MEM  WB

4.13.3a: With forwarding we can forward the output of the MEM stage in the second instruction to the input of the EX stage but we still need that nop to eliminate the hazard. If we didn't have the nop the pipeline would still stall by itself to wait. You can view the nop instruction as a programmer controlled stall.
```
      sw $16,-100($6)   # lw   IF  ID  EX  MEM  WB  
      lw $4,8($16)      # add      IF  ID  EX   MEM  WB 
      nop               # nop          IF  ID   EX   MEM  WB
      add $5,$4,$4      # sw               IF   ID   EX   MEM  WB
     
```
4.13.3b: In this example forwarding takes care of all the hazards and we don't need to add any nop instructions. The pipeline does not stall at all.