PANDA and Rust: The Last 10 Months

What is PANDA/panda-rs

PANDA is a project for whole-system emulation and analysis born from a collaboration between MIT Lincoln Laboratory, NYU, and Northeastern University. It enables users to write plugins (in C, C++, Rust, or Python) to modify and analyze a system emulated under QEMU, with the primary usecase being security research.

As previously metioned, PANDA provides first-class for Rust. This is done via the panda-rs crate, which provides safe high-level interfaces for analysis.

Crate Improvements

The last PANDA blog post, announcing the original release of Rust bindings, was roughly 10 months and 15 crate versions ago. Since then, a lot has been added due to Rust being used an increasing amount throughout PANDA.

A list of some of the highlights:

Adding Closure Support

One of the toughest features to add to panda-rs was closure support, and due to limited resources on the topic, I wanted to write some about how I went about supporting closures for callbacks across an FFI boundary.

In PANDA callbacks work by being passed a C function pointer to be called, passing any information relevant to the callback location to the callback function. In Rust the function signature looks something like this (simplified):

extern "C" {
    fn panda_register_callback(
        callback: extern "C" fn(cpu: &mut CPUState, pc: u64)
    );
}

However consider the callback type: extern "C" fn(...): in Rust this type represents something quite specific: a pointer to a free function which uses C calling conventions. This means our function will inherently look like the following:

extern "C" fn my_callback(cpu: &mut CPUState, pc: u64) {
    // ...
}

and since no variables we control are passed in, that means if we want to maintain any state between callbacks, or even multiple calls to a single callback, we have to use global state. This leads to less-than-idiomatic Rust as we then start needing lots of lazy_static, mutexes, and sprinkling of synchronization primitives everywhere.

Ideally we'd have some ability to keep track of state between callbacks without having to resort to unidiomatic and hard to maintain global variables, and typically this would be done by either having callbacks be methods (and thus have access to self for state) or closures (which can then capture variables for state).

However there's a problem with either of these approaches: when my_callback is hit, how would it know which struct/closure to call? Even if the closures are stored globally there's no way to differentiate which callbacks go to which closure.

Using Just-in-Time Compilation

One possibility is to compile a new extern "C" fn for each closure instance, but this requires JIT compilation which isn't super fun, especially when debugging across the JITed function.

While there are existing options for this, namely libffi-rs, (which I personally find leaves a bit to be desired) JIT-ing functions to jump to our closures (trampolines) is a bit heavier a solution than I would prefer.

Using Callback Userdata

So the other option is callback userdata, which is a pattern where a callback consists of two things: a function pointer and an untyped pointer to user-provided data. This allows you to pass some data when registering your callback that the callback system will pass to every instance of your callback. So then if you want to share data across calls to your callback you just allocate it on the heap, erase the type by casting to void*, then pass that alongside your callback function when registering it.

This requires a slight change to our callback function's signature:

extern "C" fn my_callback(context: *mut c_void, cpu: &mut CPUState, pc: u64);

But that's not great, PANDA is used for research by more than just the PANDA team, so backwards compatibility is ideal. In order to accomplish this, we can make a trampoline at compile time for each callback type which then takes the old style of callback as its context to call. And so we get something like this (the actual code is written as the equivelant C):

unsafe extern "C" my_callback_trampoline(context: *mut c_void, cpu: &mut CPUState, pc: u64) {
    let callback: extern "C" fn(&mut CPUState, u64) = transmute(context);
    callback(cpu, pc)
}

Then we can provide two registration APIs: one which is identical to before but internally just passes the callback as context and the trampoline as the "with context" callback. And the other registration function allowing a context variable to be provided, but requires the new callback signature.

The actual implementation of using this to allow closures to be used is similar: we need to generate a small trampoline for each function type and pass the closure as userdata. However to do this we'll need to convert our closure to a Box<Box<dyn FnMut(&mut CPUState, u64)>>, as a Box<dyn FnMut(...)> is a wide pointer, meaning it stores metadata alongside the pointer (in this case a vtable). So in order to fit into a void* (a thin pointer) we need an additional level of redirection, as a pointer to a wide pointer is not itself a wide pointer.

In the end we'll get something like this for our trampoline:

unsafe extern "C" my_callback_trampoline(context: *mut c_void, cpu: &mut CPUState, pc: u64) {
    let callback: &mut &mut dyn FnMut(&mut CPUState, u64) = transmute(context);
    callback(cpu, pc)
}

And, since &mut T is guaranteed to have the same layout as Box<T> this is something approximating sound.

Lastly, one of the usecases where closure callbacks excel is allowing the execution of a function to 'resume' at a later point. One example is using the proc_start_linux plugin in order to wait for a process to start, getting the entrypoint from the auxillary vector, then using the hooks plugin in order in order to set a callback for the entrypoint.

However in the case of the entrypoint of the process, this will be a one-off callback. And since in most cases there will be a small but notable performance penalty for leaving the callback enabled despite the fact it'll never be hit again, having the callback disable itself is ideal. However unlike free functions, closures don't have a name with which they can self-reference.

The API I ended up going with is providing copyable slot-based weak-references to callbacks, which allows the closure to capture this weak reference for use when disabling:

let mut count = 0;

let bb_callback = Callback::new();
bb_callback.before_block_exec(move |_, _| {
    count += 1;
    println!("Basic block #{}", count);

    if count > 5 {
        bb_callback.disable();
    }
});

Since bb_callback is a Copy type due to only being a callback slot ID internally, it can be copied into the closure itself to disable the callback when needed.

And if the callback doesn't need to be disabled, the Callback doesn't even need to be stored and can just be used as a temporary:

Callback::new().before_block_exec(|_, _| {
    println!("A basic block is about to be executed");
});

For plugin-to-plugin callbacks we extend this further by having the callback registration functions be a part of a trait which extends the PppCallback type, allowing additional plugins to take advantage of this regardless of whether panda-rs itself has added support.

Where panda-rs Is Being Used

One of the biggest changes panda-rs has seen in the past 10 months is not technical, but rather social. Both the PANDA team and some of our external collaborators have began to take a liking to using Rust with PANDA, and as such more of our projects have been using panda-rs at the core.

snake_hook

Ironically enough, one of the most popular Rust plugins is a shim layer to allow using Python scripts from PANDA when run from the command line. This plugin, snake_hook loads a Python class which interacts with PANDA via it's python interface. However, unlike traditional PyPANDA scripts, these scripts are launched from the command line using a PANDA plugin instead of running a Python script which controls PANDA.

The way it works is by turning each plugin into its own python module which provides a subclass of our PandaPlugin type. It then is provided a pre-constructed pypanda Panda object in order to allow it to do anything a normal pypanda script would do, but with the added ability to capture self for persistent storage. This design allows easy migration of existing pypanda code while providing all the benefits of the plugin architecture.

This class-based plugin design created to work for snake_hook was powerful enough that it was integrated into pypanda itself and named "PyPlugins" which means the same Python class can now be used from either PANDA's command line interface with snake_hook or from a PyPANDA script. Since this change has been merged, a number of PyPlugins which were previously only usable from PyPANDA scripts can now be run from the command line thanks to the power of snake_hook. One such example is Luke Craig's live process graph script which we ported to this new system, allowing it to be combined with any PANDA execution, rather than being incompatible with existing pypanda scripts (short of grafting the process list script into your own code).

Rust has been invaluable for snake_hook due to the amazing quality of pyo3 and inline-python, the ease-of-use of interacting with the python interpreter is a breath of fresh air coming from the official CPython API.

If you're intersted in learning more about pyplugins we plan on writing a follow-up blogpost going into more detail about them, follow our twitter for when that's posted.

panda-gdb

Another plugin of ours, panda-gdb, takes advantage of Rust's strong ecosystem and concurrency guarantees in order to create a hypervisor-level debugger with process-level semantics. It uses the gdbstub crate in order to provide a standard remote debugging interface, allowing it to support various CLIs, IDA, and GHIDRA without having to deal with the pain of writing our own GDB server implementation. Compared to existing options for providing a GDB stub, this has been an extremely pleasant experience, and I highly recommend anyone working in either embedded or emulation tooling check it out.

One feature panda-gdb has also been able to provide over the existing QEMU GDB stub (which has limited ability to take advantage of OS semantics) and in-guest GDB (which often requires too much per-target effort and cross-compilation troubles) is the ability to expose PANDA-specific features to a debugging context. This is done via GDB 'monitor commands', which allows for providing commands that extend GDB.

For this, panda-gdb uses the peg crate, a very inspired approach to parser grammars. Its features that really sold me on it is how well it integrates Rust into the parser grammar itself. Unlike parser generators, which require a hard separation between parsing and transformation, peg allows using a Rust code block right beside its parsing logic in order to allow immediate transformation from tokens to Rust types, typically a property more associated with the composibility of parser combinators.

// Either a hex number prefixed by 0x or a non-prefixed decimal number
rule number() -> u64
    = quiet!{
        "0x" hex:$(['0'..='9' | 'a'..='f' | 'A'..='F']+) {?
            u64::from_str_radix(hex, 16)
                .map_err(|_| "invalid hex number")
        }
        / decimal:$(['0'..='9']+) {?
            decimal.parse()
                .map_err(|_| "invalid decimal number")
        }
    }
    / expected!("a number")

Similarly, rather than using a grammar-specific manner of describing patterns, it utilizes Rust's pattern matching syntax to reduce the friction introduced by its domain-specific language. The resulting design ends up being more enjoyable to write than any other parser-grammar I've worked with before. While peg isn't perfect and is currently in a development lull, I feel it has some really great ideas for the larger parsing ecosystem to learn from.

If you'd like to check out the monitor command parsing as an example, the parser can be found here, it's only about 60 lines and I think demonstrates how the integration of patterns and inline transformations is actually quite nice. While I'm usually one to prefer splitting things up into passes, I feel the coupling of parsing and transformation here, as well as the verbosity, is worth it. And as a bonus, it only took a few minutes to make basic human-readable errors:

Error:
    taint &0x55555555 3 
          ^------ Invalid syntax, expected an address (example: *0x5555555) or a
                  register name
Error:
    gte_taint *0x55555555
    ^------ Invalid syntax, expected one of the following: "get_taint", "taint",
            "check_taint", "help"

It really has been a joy being able to take full advantage of the high-quality Rust ecosystem, and we've done our best to upstream improvements where possible (Adding architectures to gdbstub, contributing to pyo3/inline-python, etc.) in order to ensure mutual benefit to the community.

We have some really exciting work in the pipeline that uses panda-rs at the core and we can't wait to talk more about it.