Whole-System Emulation and Analysis with Rust and PANDA

By Jordan McLeod

December 16, 2020

1752 words

9 min

Introduction to PANDA

When reverse engineering a system there are two methodologies to go about learning information about it: Static Analysis, in which you don't run it and only look at the executable as it'd be stored on disk, and Dynamic Analysis, where you run that code and observe it. Dynamic Analysis can be done on the system itself, however in many cases that's often inaccessible or otherwise not optimal. A single mistake can permanently damage a system, there may be no debugging facilities exposed on the device, or a device might not be easy to physically work with for one reason or another, not to mention that introspecting on physical devices is simply not scalable for your use case.

Regardless of reasoning, emulation is a wonderful technique to have in your toolkit. It pairs well with static analysis and can enable superpower-like abilities. Run at the speed of light. Turn back time. See all the possible futures! Raise the dead!

PANDA, or Platform for Architecture-Neutral Dynamic Analysis, is an open-source QEMU-based whole-system emulator built in a collaboration between MIT Lincoln Laboratory, NYU, and Northeastern University. Some of its core features include: record/replay, lifting code to LLVM IR on-the-fly, a callback system, and a plugin-based architecture (including a lot of great plugins for common analyses like syscall callbacks and retrieval of OS state information).

Why Rust?

Over the years we've found PANDA to be rather lacking in accessibility. Writing analyses in C/C++ doesn't have the quick iteration time desired and interacting with QEMU's C API has a rather steep learning curve. In order to combat these issues, we at MIT Lincoln Laboratory introduced pypanda (now also on pip!), a easy-to-use Python interface for interacting with PANDA. Unlike the C/C++ interface, which allows the creation of plugins to be loaded by PANDA, the pypanda interface loads PANDA as a library called libpanda. So instead of PANDA driving user code, the user code drives PANDA.

Since pypanda drives PANDA, it replaces command line instantiation of PANDA (which is the QEMU command line interface with some additional arguments) with an idiomatic Python API. This makes PANDA a lot easier to use due to additions such as the generic images, a set of already-setup disk images pypanda can automatically download as needed. pypanda also enables use cases such as hosting a web server visualizing processes, which don't really make sense to be run within a plugin, not to mention C++ isn't a great choice for a web application.

The pypanda interface is great for quickly iterating on research ideas (and still easily the best in that regard), however it isn't a silver bullet. Since it uses the libpanda interface, it has limited reusability. It isn't possible, nor really desirable, to ship plugins (which take the form of shared libraries) written in Python, which reduces the composability of analyses written in Python. This makes it so any analysis that others might want to be dependent on needs to be written in C or C++ and thus miss out on a lot of the good pypanda has to offer.

The goal of panda-rs is to bring some of the ease of pypanda to the reusability and speed of the C/C++ plugin interface. In addition, it allows for a greater ability to create robust plugins due to Rust's focus on robust API and language design philosophies. I have personally spent a lot my plugin-writing time hunting down memory safety bugs. Rust's memory safety guarantees aren't a bonus security feature, it's a much needed productivity boost. Rust's expressive type system, lack of null pointers, and iterators, alongside other design choices, have noticably prevented a lot of bugs. One instance that comes to mind is that I found implementing an efficient guest-to-host string copy to be difficult to get right in C++, but with Rust's iterator interface it was significantly easier to write in a manner that didn't cause issues down the road.

Another great benefit of Rust has been the significant improvement in ability to handle multithreading, the "fearless concurrency" of Rust enables a lot of plugin possibilities that I wouldn't personally be comfortable writing without it.

How it Works

Here's what a basic PANDA plugin looks like:

use panda::prelude::*; // include common PANDA types

#[panda::init] // run this function on boot
fn init(_: &mut PluginHandle) -> bool {
    println!("My first plugin startup");

    true
}

#[panda::on_sys_write_enter] // run this function every time the `write` syscall runs
fn on_sys_write(cpu: &mut CPUState, _: target_ulong, fd: target_ulong, buf: target_ptr_t, len: target_ulong) {
    let data = cpu.mem_read(buf, len as usize);
    let data = String::from_utf8_lossy(&data);
    println!("fd: {}, data: {}", fd, data);
}

It includes a (required) function marked with #[panda::init] as well as an additional callback, which is set to print out the file descriptor and data passed to the write syscall every time its run by any process in the guest.

We can then build and install our plugin:

cargo build
cp target/debug/libmy_plugin.so $PANDA_PATH/x86_64-softmmu/panda/plugins/panda_my_plugin.so

And to run PANDA with the plugin loaded we use the following command:

panda-system-x86_64 -panda my_plugin [...]

First up we need to generate a recording to make this easier to test. PANDA allows recording everything going on in a system in order to have easy to reproduce results for your analysis. The best tool for taking recordings is pypanda. Here's a simple pypanda script for generating out recording:

from panda import blocking, Panda

panda = Panda(generic="x86_64")

@blocking
def run_cmd():
    # First revert to root snapshot, then type a command via serial
    panda.revert_sync("root")
    panda.record_cmd("echo test", recording_name="test")

    panda.end_analysis()

panda.queue_async(run_cmd)
panda.run()

This loads up a generic image, loads a snapshot from immediately after login, then records the command "echo test" saved to a replay called "test".

When we run our plugin on this replay we get the following:

My first plugin startup
...
fd: 2, data: echo test
...
fd: 2, data:
...
fd: 1, data: test
...
fd: 2, data: root@ubuntu:~#

Now, lets say we want to narrow down these to only look at a certain process, in this case the echo process which is running. We can use "OSI" (Operating System Introspection), a plugin to help gain a higher-level understanding of the context you are running under.

use panda::plugins::osi::OSI;

#[panda::on_sys_write_enter] // run this function every time the `write` syscall runs
fn on_sys_write(cpu: &mut CPUState, _: target_ulong, fd: target_ulong, buf: target_ptr_t, len: target_ulong) {
    let proc = OSI.get_current_process(cpu);
    let data = cpu.mem_read(buf, len as usize);
    let data = String::from_utf8_lossy(&data);
    println!("name: {}, fd: {}, data: {}", proc.get_name(), fd, data);
}

And, since in panda-rs plugins are lazy-loaded via a combination of lazy_static and libloading, OSI (as well as any other plugins we use via the plugin-to-plugin interface) will only be loaded if we use it.

If we test out our process name printing we get the following results:

name: bash, fd: 2, data: echo test
...
name: bash, fd: 2, data:
...
name: bash, fd: 1, data: test
...
name: bash, fd: 2, data: root@ubuntu:~#

Now let's add an argument to our plugin that allows us to configure which process our plugin should output for. Plugin arguments in PANDA take the following form:

-panda plugin_name:arg1=val1,arg2=val2

The way we handle this in panda-rs is using the PandaArgs derive macro. This specifies a struct that should be parsed from the plugin's arguments and generates the needed code to handle the parsing of it. It is largely inspired by structopt/argh and allows for significantly reducing the amount of code required for arguments when compared to a C/C++ PANDA plugin:

use panda::prelude::*;

#[derive(PandaArgs)]
#[name = "my_plugin"] // plugin name
struct Args {
    #[arg(required)]
    proc: String
}

lazy_static::lazy_static! {
    static ref ARGS: Args = Args::from_panda_args();
}

#[panda::init]
fn init(_: &mut PluginHandle) -> bool {
    // read args on startup to ensure required args are provided
    lazy_static::initialize(&ARGS);

    true
}

And then we can use the provided args in order to filter out the prints:

#[panda::on_sys_write_enter]
fn on_sys_write(cpu: &mut CPUState, _: target_ulong, fd: target_ulong, buf: target_ptr_t, len: target_ulong) {
    let proc = OSI.get_current_process(cpu);
    if proc.get_name() == ARGS.proc {
        let data = cpu.mem_read(buf, len as usize);
        let data = String::from_utf8_lossy(&data);
        println!("fd: {}, data: {}", fd, data);
    }
}

So if we pass -panda my_plugin:proc=bash, we'll get the same output as before while -panda my_plugin:proc=cat, we won't see anything for a replay that doesn't run cat.

One other feature of panda-rs is that, like pypanda, it provides a libpanda mode, allowing you to write the same code and compile it to either a plugin or a standalone executable. In order to do this you enable the libpanda feature on panda-rs and use the Panda builder to create an instance of PANDA. A builder, for those who aren't familiar with this particular idiom, allows you to chain methods to define optional properties of the struct being built. One example of this is Rust's Command builder, which allows you to chain arguments to a process before spawning it in a number of ways. In this case, it's a specialized form of safely building arguments to pass to libpanda.

use panda::prelude::*;
use panda::plugins::osi::OSI;

fn main() {
    Panda::new()
        .generic("x86_64")
        .replay("test")
        .run()
}

#[panda::on_sys_write_enter]
fn on_sys_write(cpu: &mut CPUState, _: target_ulong, fd: target_ulong, buf: target_ptr_t, len: target_ulong) {
    let proc = OSI.get_current_process(cpu);
    if cfg!(feature = "bin") || proc.get_name() == ARGS.proc {
    /* ^^^^^^^^^^^^^^^^^^^^^ If building in libpanda mode, we don't want to check the args */
        let data = cpu.mem_read(buf, len as usize);
        let data = String::from_utf8_lossy(&data);
        println!("fd: {}, data: {}", fd, data);
    }
}

You can even use the PandaArgs trait in order to load a different plugin, for example:

#[derive(PandaArgs)]
#[name = "stringsearch"]
struct StringSearch {
    str: String
}

fn main() {
    Panda::new()
        .generic("x86_64")
        .replay("test")
        .plugin_args(&StringSearch {
            str: "apple".into()
        })
        .run();
}

This loads the stringsearch plugin (built into PANDA) and uses it to search for the string "apple".

If you're interested in learning more about programmatically spinning up virtual machines for introspection, you can read more about PANDA on Github or look at the PANDA manual.