

#### Flexible Timing Simulation of RISC-V Processors with Sniper

Neethu Bal Mallya<sup>1</sup>, Cecilia Gonzalez-Alvarez<sup>2</sup>, Trevor E. Carlson<sup>1</sup>

<sup>1</sup>National University of Singapore, Singapore <sup>2</sup>Ghent University, Belgium





#### Outline



- Need for Simulation
- Sniper Simulator Overview
- Our enhancements to Sniper
- Initial Processor Performance Analysis
- Conclusion



# Why do we need Simulation?





Performance analysis of next-generation systems

Architecture design space exploration



Pre-silicon software optimizations



#### Trade-offs in Simulation











# Sniper Simulator – An Overview



- Parallel simulator based on Interval Simulation
- Models multi-/many-cores running multithreaded<sup>1</sup> and multi-program workloads
- Hardware validated for x86
- Flexible simulation options



<sup>1</sup>Currently not supported for RISCV



# Sniper – Beyond Traditional Simulation



- Strong adoption in industry and academia
  - 550+ citations
  - 800+ researcher downloads
  - 64+ countries

- Actively used since 2011
  - Belgium-based team
  - Supports next generation Xeon Phi (KNL++)
  - HiPEAC TechTransfer Award







# Sniper – Key Differentiators



- Fast development time
- Enables Limit Studies
  - Branch Prediction
  - Memory Dependence Prediction
  - Shared Multi-level Cache Hierarchy
- High Performance and Scalability





# Sniper - Interacting with the Simulator



• Python interfaces

- SimAPI
  - Magic Instructions
  - SimROIStart() SimROIEnd()

// SimSetInstrumentMode options pi.h #define SIM\_OPT\_INSTRUMENT\_DETAILED 0 ສ #define SIM OPT INSTRUMENT WARMUP HOME/include/sim\_ #define SIM\_OPT\_INSTRUMENT\_FASTFORWARD 2 // SimAPI commands SimRoiStart() SimRoiEnd() SimGetProcId() SimGetThreadId() SimSetThreadName(name) SimGetNumProcs() SimGetNumThreads() \$SNIPER\_ SimSetFreqMHz(proc, mhz) SimSetOwnFreqMHz(mhz) SimGetFreqMHz(proc) SimGetOwnFreqMHz() SimMarker(arg0, arg1) SimNamedMarker(arg0, str) SimUser(cmd, arg) SimSetInstrumentMode(opt) SimInSimulator()





# Sniper - Interacting with the Simulator

#### • Energy Stats





# Sniper - Interacting with the Simulator

• Loop Tracer

| [general]           |   |                      |
|---------------------|---|----------------------|
| syntax=att          | # | Optional             |
| [loop_tracer]       |   |                      |
| enabled=true        |   |                      |
| base_address=4090be | # | Loop start address   |
| iter_start=9000     | # | Wait before starting |
| iter_count=20       | # | Number to view       |





f Singapore

# Sniper + RISC-V ecosystem



#### • RISC-V

- Open, Extensible ISA
- Collection of related software tools



- Existing Architecture-level Software implementations
  - Functional simulators



• Many additional things



# Comparison with existing solutions



|                            | Sniper + RISC-V                                  | gem5 (RISC5)   | FireSim / Chisel / Verilog                        |
|----------------------------|--------------------------------------------------|----------------|---------------------------------------------------|
| Development<br>Methodology | C++ based (SW)                                   | C++ based (HW) | RTL based (HW)                                    |
| Dev-time                   | +++                                              | ++             | +                                                 |
| Sim-time                   | +++                                              | ++             | ++++/+/+                                          |
| Simulation<br>model        | Cycle-level + Cycle-<br>approximate              | Cycle-level    | Cycle-exact + Cycle-approximate                   |
| Flexibility                | Ease-of-use / modification                       |                | Requires RTL/<br>abstract models                  |
| Fidelity                   | Sophisticated models require hardware validation |                | Cycle-exact models derived from synthesizable RTL |



## Simulation Flow







#### **Sniper Architecture**







## How did we enhance Sniper?







# How did we enhance Sniper?



RISC-V functional simulators - rv8 / Spike were updated to support SIFT generation

Decoder Library Architectural agnostic methods were added to implement the decoding phase of the processor



Core Model

Parameters like description of ports/ functional units, latencies, etc. were updated





Configuration files to resemble a BOOM processor



#### Sniper Instruction Trace File Format (SIFT)



• Dynamic Instruction stream generated by the Frontend

Instruction Execution Order

Memory Addresses for Loads and Stores

Branch Directions (taken/not taken)

Executed/masked info for Predicated instructions



Dynamic

#### How to add new Frontend?





#### Control

Sift::Writer::Magic()

Instruction Instrumentation

Sift::Writer::InstructionCount()
Sift::Writer::CacheOnly()
Sift::Writer::Instruction()
// addresses, branch direction, etc.



#### How to add new Frontend?







#### How to add new Frontend?







## How to update Backend?



- Decoder Library
  - 2 classes
    - Decoder
    - InstructionDecoded
- Core Model
- Config Files

\$SNIPER\_HOME/decoder\_lib

\$SNIPER\_HOME/common/performance\_model

\$SNIPER\_HOME/config



# How to run Sniper?



#### ./run-sniper --frontend=[pin | dr | spike | rv8 | legacy] --config

[SNIPER] Start [SNIPER] -----[SNIPER] Sniper using SIFT/trace-driven frontend [SNIPER] Running full application in DETAILED mode -----[SNIPER] -----[SNIPER] Enabling performance models [SNIPER] Setting instrumentation mode to DETAILED Trace Monitor Started [TRACE:0] -- DONE --[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M instructions, 11.2M cycles, 0.45 IPC [SNIPER] Simulation speed 273.4 KIPS (273.4 KIPS / target core - 3657.1ns/instr) [SNIPER] Setting instrumentation mode to FAST\_FORWARD [SNIPER] End [SNIPER] Elapsed time: 18.41 seconds



# **Experimental Setup**



- Sniper multi-core simulator
  - Similar to BOOM v1 DefaultConfig
    - Dispatch width:2, Issue Width:3, ROB:80
  - 32KB L1s, 1MB L2
  - 2.0GHz
- SPEC CPU2006 benchmarks
  - First 5M instructions





## **Initial Processor Performance Analysis**



Source: Tuan Ta, et. al, "Simulating Multi-Core RISC-V Systems in gem5", [CARRV 2018]



## Conclusion



- An infrastructure extension of Sniper
- Sniper + RISC-V is now available

Alpha-version

- Next steps
  - Improve the simulator features to allow for a detailed comparison with cycle-level processor implementations





- Thank you
- Download Today!
  - <u>http://snipersim.org/w/Download</u>
- Questions?
  - <u>http://groups.google.com/group/snipersim</u>





#### Flexible Timing Simulation of RISC-V Processors with Sniper

Neethu Bal Mallya<sup>1</sup>, Cecilia Gonzalez-Alvarez<sup>2</sup>, Trevor E. Carlson<sup>1</sup>

<sup>1</sup>National University of Singapore, Singapore <sup>2</sup>Ghent University, Belgium



