-
4.1.1a:
RegMux
refers to the mux that chooses between the ALU output and data memory output
and then feeds the data input port on the registers.
ALUMux
refers to the mux between second read port of registers
and goes into the second input of the ALU. For both multiplexors assume
that the 0 output is at the top.
RegWrite = 1
MemRead = 0
ALUMux = 0
MemWrite = 0
ALUOp = 10 (R-Type)
RegMux = 1
Branch = 0
-
4.1.1b:
RegWrite = 0
MemRead = 0
ALUMux = 1
MemWrite = 1
ALUOp = 00 (SW)Add
RegMux = X
Branch = 0
-
4.1.2a: All of them perform a useful function
except Data Memory and the branch Add ALU.
-
4.1.2b: All of them perform a useful function
except the branch Add ALU and
the write and data port on the registers.
-
4.1.3a:The branch adder output is not used and the data
memory does not produce output.
-
4.1.3b: The branch adder output is not used.
RegMux output is not used. Data memory does not produce otuput.
-
4.1.4:
One long path for AND instruction is to read the instruction, read the reg-
isters, go through the ALUMux, perform the ALU operation, and go through the
Mux that controls the write data for Registers (I-Mem, Regs, Mux, ALU, and Mux).
The other long path is similar, but goes through Control while registers are read
(I-Mem, Control, Mux, ALU, Mux). There are other paths but they are shorter,
such as the PC increment path (only Add and then Mux), the path to prevent
branching (I-Mem, Control, Mux uses Branch signal to select the PC + 4 input as
the new value for PC), the path that prevents a memory write (only I-Mem and
then Control, etc).
For part A:
Control is faster than registers, so the critical path is
I-Mem, Regs, Mux, ALU, Mux.
For part B:
The two long paths are equal, so both are critical.
-
4.1.5: One long path is to read the instruction,
read registers, use the Mux to select the
immediate as the second ALU input, use ALU (compute address), access D-Mem,
and use the Mux to select that as register data input, so we have I-Mem, Regs,
Mux, ALU, D-Mem, Mux. The other long path is similar, but goes through Control
instead of Regs (to generate the control signal for the ALU MUX). Other paths are
shorter, and are similar to shorter paths described for 4.1.4.
Part A:
Control is faster than registers, so the critical path
is I-Mem, Regs, Mux, ALU, D-Mem, Mux.
For part B:
The two long paths are equal, so both are critical.
-
4.1.6:
This instruction has two kinds of long paths, those that determine the
branch condition and those that compute the new PC. To determine the branch
condition, we read the instruction, read registers or use the Control unit, then use
the ALU Mux and then the ALU to compare the two values, then use the Zero output
of the ALU to control the Mux that selects the new PC.
To compute the PC, one path is to increment it by 4 (Add), add the offset (Add),
and select that value as the new PC (Mux). The other path for computing the PC is
to Read the instruction (to get the offset), use the branch Add unit and Mux. Both
of the compute-PC paths are shorter than the critical path that determines the
branch condition, because I-Mem is slower than the PC + 4 Add unit, and because
ALU is slower than the branch Add.
For part A:
The first path through the registers is longer.
For part B:
The two long paths are equal, so both are critical.
-
4.12.1a: For the non-pipelined (single cycle) processor
you need to add up the times for each unit, which is 1250ps.
For the pipelined processor the cycle time is the time of
the slowest unit (stage), in this case 350ps.
-
4.12.1b: pipelined is 220ps, non-pipelined is 950ps.
-
4.12.2a: The total latency for pipelined is
$5(350)=1750$ps
and the non-pipelined is 1250ps. The pipelined version is longer because
each stage takes 350ps. Recall that each stage must have a delay that
is the maximum of the delays of the stages.
-
4.12.2b:. Pipelined is $5(220) = 1100$ps and
the non-piplined is 950ps.
-
4.13.1a: There is a dependency on
$r4
from the second to the third instruction.
-
4.13.1b: There is a dependency on
$r1
from first to second instruction and a
dependency on $r2
from the second to the third instruction.
There is also a dependency on $r1
between the
first and third instruction.
-
4.13.2a:
Since there is no forwarding then the add instruction has
to wait until the end of the writeback stage in order
to get the correct value in
$4
.
sw $16,-100($6) # lw IF ID EX MEM WB
lw $4,8($16) # add IF ID EX MEM WB
nop # nop IF ID EX MEM WB
nop # nop IF ID EX MEM WB
nop # nop IF ID EX MEM WB
add $5,$4,$4 # sw IF ID EX MEM WB
The writeback of the lw
instruction needs to finish before the
decode stage of the add
.
-
4.13.2b:
Again, we need to make sure that the writeback stage of the first
instruction completes before the decode stage of the second. Same
for the second and third instructions.
or $1,$2,$3 # or IF ID EX MEM WB
nop # nop IF ID EX MEM WB
nop # nop IF ID EX MEM WB
nop # nop IF ID EX MEM WB
or $2,$1,$4 # or IF ID EX MEM WB
nop # nop IF ID EX MEM WB
nop # nop IF ID EX MEM WB
nop # nop IF ID EX MEM WB
or $1,$1,$2 # or IF ID EX MEM WB
-
4.13.3a:
With forwarding we can forward the output of the
MEM
stage in the second
instruction to the input of the EX
stage but we still need that nop to
eliminate the hazard. If we didn't have the nop
the pipeline would still
stall by itself to wait. You can view the nop
instruction
as a programmer controlled stall.
sw $16,-100($6) # lw IF ID EX MEM WB
lw $4,8($16) # add IF ID EX MEM WB
nop # nop IF ID EX MEM WB
add $5,$4,$4 # sw IF ID EX MEM WB
-
4.13.3b:
In this example forwarding takes care of all the hazards and we don't
need to add any
nop
instructions. The pipeline does not stall at all.