Routine Inlining
Description
While trace linking reduces the intervention of R-Visor during execution, its advantage can be negated when C/C++ instrumentation routines are present, as one of the dispatcher's tasks is to execute these routines at predetermined points, often involving context switches. To maintain the performance benefits of trace linking, R-Visor's API allows for the insertion of instrumentation routines directly into the basic block in the code cache during its allocation phase. We dub this feature routine inlining.
Our version of routine inlining allows users to write short snippets in RISC-V assembly with the aid of R-Visor's built-in encoder. This gives users low-level control over the instrumentation routines being executed. We recommend routine inlining for short, performance-critical instrumentation routines and for users proficient in RISC-V assembly.
Writing Inline Routines
Collecting Metrics from Default Routine (Baseline)
To demonstrate the benefits of routine inlining, we will first implement a standard basic block counting routine (similar to a C/C++ callback) and collect performance metrics. We'll then compare these to an inlined version.
Assume you have a
rvisor_bb.c
file in your
routines/
folder designed for this baseline
measurement.
-
Enable Metrics Collection in CMake: Modify your
CMakeLists.txtto include the-DMETRICScompilation flag. This flag is typically used to enable sections of code (guarded by#ifdef METRICS) that collect and report performance data. -
Register the
rvisor_bbRoutine: Update yourCMakeLists.txtto build thervisor_bb.cexecutable. -
Build the Routine: Navigate to your build directory and compile:
-
Execute and Observe Metrics: Run the newly built routine with a test binary (e.g.,
benchmarks/aha-mont64):After execution, performance metrics (if the
METRICSsection is implemented to do so) might be stored in a log file, for instance,rvisor_bb_logs.txt. Example output is shown below:
This provides a baseline for comparison.
Writing the Counting Logic in RISC-V
Inline routines are composed of individual RISC-V instructions. For basic block counting, our goal is to increment a counter variable each time a basic block is executed.
Here's the plan for the RISC-V assembly sequence:
-
Get the memory address of our counter variable (
num_basic_blocks). We'll use thet6register to hold this address. -
Load the current value of the counter, increment it. We'll use the
t5register for the counter's value. -
Store the new value back to the counter variable.
The routine so far:
However, the registers
t5 and
t6
are
general-purpose temporary registers that the instrumented binary
might also be using. To avoid corrupting the binary's state (a
key principle of transparent instrumentation), we must save
their original values before using them and restore them after
our routine completes. We can use the stack for this:
# Save original values of t6 and t5
ADDI sp, sp, -16 # Adjust stack pointer to make space for 2 registers (8 bytes each)
SD t6, 0(sp) # Store t6 onto the stack
SD t5, 8(sp) # Store t5 onto the stack (at a different offset)
# Core counting logic
LI t6, &num_basic_blocks # Target address of num_basic_blocks into t6
LD t5, 0(t6) # Load value of num_basic_blocks into t5
ADDI t5, t5, 1 # Increment t5
SD t5, 0(t6) # Store incremented value back to num_basic_blocks
# Restore original values of t6 and t5
LD t6, 0(sp) # Restore t6 from the stack
LD t5, 8(sp) # Restore t5 from the stack
ADDI sp, sp, 16 # Adjust stack pointer back
Adding the Inline Routine in R-Visor (C API)
Now that we have planned our RISC-V sequence, we can use
R-Visor's C API to inject these instructions. We'll create a new
C file
tlrvisor_bb.c.
R-Visor's C API provides functions like
rvisor_add_inline_inst_bb()
to insert a single
instruction (as a
uint32_t
encoding) and
rvisor_add_inline_li_bb()
as a helper for loading
immediate values (like addresses). We use R-Visor's C encoder
functions (e.g.,
encode_ADDI
,
encode_SD,
encode_LD
) to convert our
planned assembly into the required
uint32_t
instruction encodings.
Here's how the C code would look for adding the inlined instructions:
// ADDI sp, sp, -16
rvisor_add_inline_inst_bb(encode_ADDI(SP, SP, -16), POST);
// SD t6, 0(sp)
rvisor_add_inline_inst_bb(encode_SD(0, SP, T6), POST);
// SD t5, 8(sp)
rvisor_add_inline_inst_bb(encode_SD(8, SP, T5), POST);
// LI t6, &num_basic_blocks (R-Visor helper for loading address)
rvisor_add_inline_li_bb((uint64_t)(&num_basic_blocks), T6, POST);
// LD t5, 0(t6)
rvisor_add_inline_inst_bb(encode_LD(T5, 0, T6), POST);
// ADDI t5, t5, 1
rvisor_add_inline_inst_bb(encode_ADDI(T5, T5, 1), POST);
// SD t5, 0(t6)
rvisor_add_inline_inst_bb(encode_SD(0, T6, T5), POST);
// LD t6, 0(sp)
rvisor_add_inline_inst_bb(encode_LD(T6, 0, SP), POST);
// LD t5, 8(sp)
rvisor_add_inline_inst_bb(encode_LD(T5, 8, SP), POST);
// ADDI sp, sp, 16
rvisor_add_inline_inst_bb(encode_ADDI(SP, SP, 16), POST);
Full C++ Code with Inlining
Here's the complete
tlrvisor_bb.c
modified for
inline basic block counting:
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h> // For strrchr, snprintf if used for dynamic log names
// R-Visor C Headers (adjust paths as needed)
#include "../helpers/uthash.h" // If used by other parts not shown
#include "../src/code_cache.h" // For code cache related definitions
#include "../src/decode.h" // For encoder functions (encode_ADDI, etc.) and register defs (SP, T5, T6)
#include "../src/elf_reader.h" // For ELF reading functionalities
#include "../src/logger.h" // For rvisor_logger, set_logging_file
#include "../src/rail.h" // For core R-Visor API (rvisor_init, etc.)
// If register definitions like SP, T5, T6 are not in decode.h, define/include them appropriately
// For example, from a hypothetical regs.h or directly:
// #define SP 2
// #define T5 30
// #define T6 31
// These are illustrative; use actual values from your R-Visor's ISA definition.
static int num_basic_blocks; // Our counter variable
#ifdef METRICS
uint64_t start_cycle, end_cycle;
uint64_t start_instr, end_instr;
// This routine is called once when the target program exits if METRICS is defined.
void exit_routine_metrics(uint64_t *regfile){ // regfile might be unused
asm volatile ("csrr %0, instret" : "=r"(end_instr)); // Read instruction retired counter
asm volatile ("rdcycle %0" : "=r"(end_cycle)); // Read cycle counter
fprintf(rvisor_logger, "Num Basic Blocks: %d\n", num_basic_blocks);
fprintf(rvisor_logger, "Total elapsed cycles: %lu\n", (unsigned long)(end_cycle - start_cycle));
fprintf(rvisor_logger, "Total instructions executed: %lu\n", (unsigned long)(end_instr - start_instr));
fclose(rvisor_logger); // Ensure logger is closed
}
#else
// Default exit routine if METRICS is not defined
void exit_routine_default(uint64_t *regfile) {
fprintf(rvisor_logger, "Num Basic Blocks: %d\n", num_basic_blocks);
fclose(rvisor_logger);
}
#endif
int main(int argc, char** argv, char** envp) {
if(argc < 2){
printf("Please provide a target binary\n");
exit(1);
}
num_basic_blocks = 0; // Initialize counter
// Set log file (can be dynamic as in your other examples)
set_logging_file("bb_counting_inline_logs.txt", "w");
#ifdef METRICS
// Record start cycles and instructions retired.
asm volatile ("rdcycle %0" : "=r"(start_cycle));
asm volatile ("csrr %0, instret" : "=r"(start_instr));
#endif
rvisor_trace_linking_enabled = 1; // Enable trace linking for better performance.
rvisor_init(argv[1]); // Initialize R-Visor with the target binary.
// Register arguments for the target program.
rvisor_register_args(argc, argv, envp);
// --- Injecting the Inlined Routine ---
// Assumes POST means these instructions are added to the code cache to be executed
// in sequence with the basic block, effectively acting as instrumentation.
// ADDI sp, sp, -16
rvisor_add_inline_inst_bb(encode_ADDI(SP, SP, -16), POST);
// SD t6, 0(sp)
rvisor_add_inline_inst_bb(encode_SD(0, SP, T6), POST);
// SD t5, 8(sp)
rvisor_add_inline_inst_bb(encode_SD(8, SP, T5), POST);
// LI t6, &num_basic_blocks
rvisor_add_inline_li_bb((uint64_t)(&num_basic_blocks), T6, POST);
// LD t5, 0(t6)
rvisor_add_inline_inst_bb(encode_LD(T5, 0, T6), POST);
// ADDI t5, t5, 1
rvisor_add_inline_inst_bb(encode_ADDI(T5, T5, 1), POST);
// SD t5, 0(t6)
rvisor_add_inline_inst_bb(encode_SD(0, T6, T5), POST);
// LD t6, 0(sp)
rvisor_add_inline_inst_bb(encode_LD(T6, 0, SP), POST);
// LD t5, 8(sp)
rvisor_add_inline_inst_bb(encode_LD(T5, 8, SP), POST);
// ADDI sp, sp, 16
rvisor_add_inline_inst_bb(encode_ADDI(SP, SP, 16), POST);
// Register the appropriate exit routine.
#ifdef METRICS
rvisor_register_exit_routine(exit_routine_metrics);
#else
rvisor_register_exit_routine(exit_routine_default);
#endif
rvisor_run(); // Start R-Visor and the target program.
return 0;
}
Comparing Metrics
After building this inlined version (ensure
CMakeLists.txt
now points to
tlrvisor_bb.c):
Check the output file (e.g.,
rvisor_bb_inline_logs.txt
as set by
setLoggingFile):
Comparing these example results:
-
Default Routine (Callback-based):
- Total elapsed cycles:
13621698 - Total instructions executed:
13621692
- Total elapsed cycles:
-
Inlined Routine:
- Total elapsed cycles:
8566194 - Total instructions executed:
8566188
- Total elapsed cycles:
We can observe a significant reduction in both total elapsed cycles and instructions executed. This demonstrates that using R-Visor's routine inlining feature, especially in conjunction with trace linking, can effectively reduce the computational overhead of instrumentation. This makes it a valuable technique for performance-sensitive analysis tasks.