1. MIPS simulator

QtMips simulator

Download, unzip, and run the desktop version. Or, you can use the web version, but be aware that the graphics may be glitchy.

2. Data hazard with lw

lw $x, 0xAAAA($y)
sw $x, 0xBBBB($z)  ; <-- data hazard: $x is updated then used immediately

2.1. Single-cycle datapath

In QtMips setup the most basic single-cycle datapath that we began with:

  • File -> New Simulation

  • Select "No pipeline no cache"

  • Start empty

qtmips dialog singlecycle

Load the built-in example:

  • File -> Examples -> simple-lw-sw-ia.S

qtmips menu example

Study this example. What does it do? How does it do it?

⋯ (stop scrolling and take a look!)⋯

This assembler syntax is a subset of gas GNU Assembler:

  • .globl _start is declared as a global label (across all input files)

  • .set noreorder prevents the assembler from adding nop or otherwise optimizing the code

  • .text what follows is machine code

  • loop: symbolic label on a memory address (otherwise beq $0, $0, loop needs to be hard-coded with the correct offset)

  • .data what follows is NOT machine code, just variables and constants

  • .org this starts at specific memory address (0x2000)

Notice that the variable at 0x2000 is set to 0x12345678 and the second variable is set to zero. This second variable (at 0x2004 since the previous entity was a 4-byte MIPS word) is what you want to watch when stepping through the code.

Assemble this code and load:

  • image::data/qtmips-button-compile.png[]

  • Machine -> Compile Source

  • Ctrl-E

The #pragmas ensure that you are at least seeing the Registers and Memory windows. The ending #pragma also conveniently ensures the Memory window is scrolled to the relevant location.

  • Single step through the program by qtmips button singlestep or Ctrl-T.

  • Count how many steps until memory 0x2004 gets updated (you should get 3).

Switch from the simple-lw-sw-ia.S source code tab to the Core tab.

  • Compile and load again (Ctrl-E)

  • Watch the datapath image update values of various registers and wires.

  • Do this cycle for a bit to get a handle on how the simulator displays the datapath and updates at each step. Your eye-brain system is good at finding changes in a scene.

A large screen is super helpful for watching the datapath! Be sure to notice the colored current instructions at the top and the information at the bottom (Cycles …​ Stalls).

2.2. Pipeline with no features

Setup a new datapath:

  • File -> New Simulation

  • Custom

  • Core:

    • Pipelined

    • (no) Hazard

  • Program cache

    • Not enabled

  • Data cache

    • Not enabled


How many cycles does it take for 0x2004 to get updated? You should count 9 cycles.

¿But aren’t pipelines supposed to increase performance?

  • Insert at least 4 nop instructions between the sw and the while(1) beq and compile and run again

The dst_val location does not get updated until the second time through the loop. This is a programming BUG; there is no reason that this lw / sw combination needs to be executed twice.

Welcome to a pipeline data hazard.

How do you ensure that register $2 has the correct value by the time that the sw instruction reads the $2 value?

  • The destination register for a load instruction cannot be used as a source for at least 2 cycles.

2.3. Pipeline with hazard detection and stall

Add the hazard detection unit:

  • File -> New Simulation

  • Custom

  • Core:

    • Pipelined

    • (YES) Hazard

    • Stall when hazard is detected

  • Program cache

    • Not enabled

  • Data cache

    • Not enabled


Run the modified example code:

  • no nops between lw / sw

  • at least 4 nops before beq

How many cycles? (should be 7)

  • Notice how the datapath inserts exactly two nops (bubbles) into the pipeline when the control unit (in the second stage ID) sees the use of $2 right after the lw.

  • If you haven’t already seen how the MUXs change to route signals …​ now you know. It’s interesting to watch the sequencing.

Having a hazard detector + stall built into the datapath removes the programmer’s responsibility for this bug. It is A-OK if the programmer does add two nops, but they happen anyway, so no advantage.

What is better is the programmer knowing this behavior and using two unrelated but useful instructions after the lw — no stall and higher performance.

2.4. Pipeline with forwarding

Now, add the shortcut of forwarding data destined for a register but sneaked in one or two cycles early.

  • File -> New Simulation

  • Custom

  • Core:

    • Pipelined

    • (YES) Hazard

    • Stall or forward when hazard is detected

  • Program cache

    • Not enabled

  • Data cache

    • Not enabled


Run the modified example code:

  • no nops between lw / sw

  • at least 4 nops before beq

How many cycles? (should now be 6)

Notice how there is still one stall? Even with forwarding, there is still one cycle of pause.

An assembly programmer that understands this will automatically seek to insert another instruction after a lw that does not use that particular register. There is still opportunity for the programmer to optimize performance.

The single-cycle datapath has no such limitation, so why do we pipeline, again?

(that would be a great test question!)