Dynamic Instrumentation API (proposed)
Revision 0.1

Jeffrey K. Hollingsworth
Computer Science Department
University of Maryland
College Park, MD 20742
hollings@cs.umd.edu

Barton P. Miller
Computer Sciences Department
University of Wisconsin
Madison, WI 53706-1685
bart@cs.wisc.edu

Note that this is a draft document. There are comments about issues that have not yet been settled and, in a few places, interfaces that are not completely defined. We are releasing this draft version to encourage comments and suggestions.

1. Introduction

The normal cycle of developing a program is to edit source code, compile it, and then execute the resulting binary. However, sometimes this cycle can be too restrictive. We may wish to change the program while it is executing, and not have to re-compile, re-link, or even re-execute the program to change the binary. At first thought, this may seem like a bizarre goal, however there are several practical reasons we may wish to have such a system. For example, if we are measuring the performance of a program and discover a performance problem, it might be necessary to insert additional instrumentation into the program to understand the problem. Another application is performance steering; for large simulations, computational scientists often find it advantageous to be able to make modifications to the code and data while the simulation is executing.

This document describes an Application Program Interface (API) to permit the insertion of code into a running program. Runtime code changes are useful to support a variety of applications including debugging, performance monitoring, and to support composing applications out of existing packages. The goal of this API is to provide a machine independent interface to permit the creation of tools and applications that use runtime code patching. This API is based on the idea of Dynamic Instrumentation described in [2].

The unique feature of this interface is that it makes it possible to insert and change instrumentation in a running program. This differs from other post-linker instrumentation tools [3] that permit code to be inserted into a binary before it starts to execute.

The goal of this API is to keep the interface small and easy to understand. At the same time it needs to be sufficiently expressive to be useful for a variety of applications. The way we have done this is by providing a simple set of abstractions and a simple way to specify the code to insert into the application. To generate more complex code, extra (initially un-called subroutines) can be linked into the application program, and calls to these subroutines can be inserted at runtime.

We are in the process of producing a code-release that conforms to this API. The Dynamic Instrumentation in the Paradyn system will be the basis for this, so the API will support AIX, SunOS, Solaris (SPARC and x86), and HP-UX. As Paradyn is ported to other platforms, the Dyninst API will also support those platforms.

2. Abstractions

The API is based on abstractions of a program and its state while in execution. The two primary abstractions are points and snippets. A point is a location in a program where instrumentation can be inserted. A snippet is a representation of a bit of executable code to be inserted into a program at a point. For example, if we wished to record the number of times a procedure was invoked, the point would be the first instruction in the procedure, and the snippets would be a statement to increment a counter. Snippets can include conditionals, function calls, and loops.

The API is designed so that a single instrumentation process can insert snippets into multiple processes executing on a single machine. To support multiple processes, two additional abstractions, threads and images, are included in the API. A thread refers to thread of execution. Depending on the programming model, a thread can correspond to either a normal process or a lightweight thread. Images refer the static representation of a program on disk.

The relationship between these four abstractions is shown in Figure 1. Images contain points where their code can be modified. Each thread is associated with exactly one image.

3. Simple Example

To illustrate the ideas of the API, we present several short examples that demonstrate how the API can be used. The full details of the interface are presented in the next section. To prevent confusion, we refer to the process we are modifying as the application, and the program that uses the API to modify the application as the mutator. A mutator is a seperate process that modifies an application process.

The first thing a mutator needs to do is identify the application process to be modified. If the process is already in execution, this can be done by specifying the process id of the application as an argument to create an instance of a thread object:

    appThread = new BPatch_thread(proccesId);

This creates a new instance of the BPatch_thread class that refers to the existing process. It had no affect on the state of the process (i.e., running or stopped). If the process has not been started, the mutator specifies the name of the command line to execute the process:

    appThread = new BPatch_thread(argc, argv);

Once the application thread has been created, the mutator defines the snippet of code to be inserted and the points where they should be inserted. For example, if we wanted to count the number of times a procedure called InterestingProcedure executes, the mutator might look like this:

    BPatch_image appImage;
    BPatch_Vector points;

    // Open the program image associated with the thread and return a
    // handle to it.
    appImage = appThread->getImage();

    // find and return the entry point to the "InterestingProcedure".
    points = appImage.findProcedurePoint("InterestingProcedure",
                                         BPatch_entry);

    // Create a counter variable (but first get a handle to the correct
    // type).
    BPatch_variableExpr &intCounter =
                        appThread->malloc(appImage.findType("int"));

    // Create a code block to increment the integer by one.
    //      intCounter = intCounter + 1
    //
    BPatch_arithExpr addOne(BPatch_assign,
                            intCounter,
                            BPatch_arithExpr(BPatch_plus,
                                             intCounter,
                                             BPatch_constExpr(1)));

    // insert the snippet of code into the application.
    appThread->insertBlock(addOne, points);

4. Interface

This section describes functions in the API. The API is organized as a collection of C++ classes. The primary classes are BPatch_thread, BPatch_image, BPatch_point, and BPatch_snippet. The API also uses a template class called BPatch_Vector. This class is modelled after the Standard Template Library (STL) vector class.

4.1 class BPatch_thread

BPatch_thread is the primary class to operate on (and to create) code in execution.

BPatch_thread(int pid) BPatch_thread(int pid, int tid) BPatch_thread(int argc, char *argv[]) BPatch_thread(BPatch_Vector threads): Each of these constructs creates a new instance of the BPatch_thread object. The first constructor associates a BPatch_thread with an existing process. The second function associates a new BPatch_thread with an existing thread within a process. The meaning of thread and process is implementation specific. The ability to use the first two interfaces to create a BPatch_thread object for an existing process depends on support from the underlying operating system and may not be implemented on all platforms. The running state of the process is not affected by these two routines. The third interface creates a new process and creates a new BPatch_thread for the class. The process is created, but is it put into a stopped state before executing any code. The fourth constructor creates a new "virtual" thread from a list of threads. This permits operations to be performed on several threads as a group. This can (potentially) increase the efficiently of the requests because they can be processed in parallel.
const BPatch_image &getImage(): Open the executable file associate this BPatch_thread object and return a handle to it. Depending on the implementation this might also parse the application's symbol table.
void stopExecution() void continueExecution(): These two functions change the running state of the thread. stopExecution puts the thread into a stopped state. Depending on the operating system, stopping one thread may stop all threads associated with a process. continueExecution continues execution of the thread (or group of threads if they have to be stopped atomically).
bool isStopped() int stopSignal() bool isTerminated(): There three functions query the status of a thread. isStopped returns true if the thread is currently stopped. If the process is stopped (as indicated by isStopped), then stopSignal can be called to find out what signal caused the process to stop. isTerminated returns true if the thread has exited. Any of these functions may be called multiple times and calling them will not affect the state of the thread.
void catchSignal(int signum); void ignoreSignal(int signum);: These two functions indicate that the process should be stopped or not when it receives the named signal.
int dumpCore(const String &file, const bool terminate): This function causes the thread to dump its state to the passed file argument. If the terminate flag is true, the thread is also terminated. The ability to use this function depends on support from the underlying operating system and may not be implemented on all platforms.
BPatch_variableExpr malloc(int n) BPatch_variableExpr malloc(const BPatch_type&): These two functions allocate memory. Memory allocation is from a heap. The heap is not (necessarily) the same heap used by the application. The available space in the heap may be limited depending on the implementation. The first function, malloc(int n), allocates n bytes of memory from the heap. The second function, malloc(const BPatch_type&t), allocates enough memory to hold an object of the specified type. Using the second version is strongly encouraged because it provides additional information to permit better type checking of the passed code. The returned memory is from a global heap, and may be used in different snippets.
void free(const BPatch_variableExpr &ptr): Free the memory in the passed ptr. The programmer is responsible to verify that all code that could reference this memory will not execute again (either by removing all snippets that refer to it, or by analysis of the program).
InferiorPC(const BPatch_snippet &expr): Cause snippet to be called once. This interface has several applications, including causing initialization functions to be called in the application. The application process must be stopped when this is called. This call will use the application stack for saving local state.
insertSnippet(const BPatch_snippet &expr, const BPatch_point&) insertSnippet(const BPatch_snippet &expr, const BPatch_Vector&): Insert a snippet of code at the specified point. If a list of points is supplied, insert the code snippet at each point in the list. What about wild cards for all threads in a process?
setTypeChecking(bool state): Turn on or off type-checking of snippets. By default type-checking is turned on, and an attempt to create a snippet that contains type conflicts will fail. Any snippet expressions created with type-checking off have the type of their left operand. Turning type-checking off, creating a snippet, and then turn type-checking back on is similar to type cast operation is the C programming language.
setMutationsActive(bool): Enable or disable the execution of snippets for the thread. This provides a way to temporally disable all of the dynamic code patches that have been inserted without having to delete them one by one. All allocated memory will remain unchanged while the patches are disabled. When the mutations are not active, the process control functions (i.e., stopExecution and continueExecution) can still be used. Requests to insert snippets (including oneShots) may not be made while mutations are disabled.

One additional convenience (non-member) function is provided to test if the status of any of the threads managed by the mutator has changed.

bool pollForStatusChange();: This is useful for a mutator that needs to periodically check on the status of its managed threads and does not want to have to check each process individually.

4.2 class BPatch_image

This class defines a program image (the executable associated with a thread).

const BPatch_Vector<&BPatch_point> &getProcedures(): Return a table of the procedures in the image.
const BPatch_Vector<&BPatch_point> &findProcedurePoint(const String &name, const BPatch_procedureLocation&): Return the BPatch_point associated with the requested procedure. The BPatch_procedureLocation argument is one of BPatch_entry, BPatch_exit, BPatch_subroutine, BPatch_longJump, or BPatch_allLocations. It is used to select which type of points associated with the procedure will be returned. BPatch_entry and BPatch_exit request respectively the entry and exit points of a subroutine. BPatch_subroutine returns the list of points where the procedure calls other procedures. BPatch_longJumps returns any long jump statements made by the procedures. If the lookup fails to locate any points of the requested type, a list with zero elements is returned. The function can fail either because the procedure does not exist or because there are no such points.
const BPatch_point &findLinePoint(const String &fileName, int line): Return the handle to the instrumentation point nearest to the requested fileName and line number. The nearest point to a requested line is the last executable instruction before the line (Note this function can have strange interactions with optimized code).
const BPatch_variableExpr &findVariable(const String &name): Lookup the passed variable name as a global variable. The lookup is done in the scope of global variables defined in the original (un-instrumented) application program. The returned BPatch_variableExpr can be used to create references to the variable in subsequent snippets. If the image was not compiled with debugging symbols, this function will fail even if the global variable is defined.
const BPatch_variableExpr &findVariable(const BPatch_point &scope, const String &name)
Lookup and return a handle to the named variable using the passed BPatch_point as the scope of the variable. The returned BPatch_variableExpr can be used to create references (uses) of the variable in subsequent snippets. The scoping rules used will be those of the source language. If the image was not compiled with debugging symbols, this function will fail even if the variable is defined in the passed scope.
const BPatch_type &findType(const String &name): Lookup and return a handle to the named type. The handle can be used as an argument to malloc to create new variables of the corresponding type.

4.3 Class BPatch_snippet

A snippet is an abstract representation of code to insert into a program. Snippets are defined by creating a new instance of the correct subclass of a snippet. For example, to create a snippet to call a function, you create a new instance of the class BPatch_funcCallExpr. Creating a snippet does not result in code being inserted into an application. Code is generated when a request is made to insert a snippet at a specific point in a program. Sub-snippets may be shared by different snippets (i.e. a handle to a snippet may be passed as an argument to create two different snippets), but whether the generated code is shared (or replicated) between two snippets is implementation dependent.

const BPatch_type &getType()
Return the type of the snippet.
float getCost(): Return the estimated cost of the snippets in seconds. The problems with accurately estimating the cost of code are numerous and out of the scope of this document [1]. But, it is important to realize that the returned cost value is (at best) an estimate.

The rest of the classes are derived classes of the class BPatch_snippet.

BPatch_sequence(const BPatch_Vector &items)

Define a sequence of snippets. The passed snippets will be executed in the order in which they appear in the list.

BPatch_funcCallExpr(const BPatch_function& func, const BPatch_Vector &args)

Define a call to a function, the passed function must be valid for the current code region. Args is a list of arguments to pass to the function. If type checking is enabled, the types of the passed arguments are checked against the function to be called (Availability of type checking depends on the source language of the application and program being compiled for debugging).

BPatch_boolExpr(BPatch_relOp op, const BPatch_snippet &lOperand, const BPatch_snippet &rOperand)

Define a relational snippet. The available operators are:

Operator	Description
BPatch_lt	Return `lOperand < rOperand`
BPatch_eq	Return `lOperand == rOperand`
BPatch_gt	Return `lOperand > rOperand`
BPatch_le	Return `lOperand <= rOperand`
BPatch_ne	Return `lOperand != rOperand`
BPatch_ge	Return `lOperand >= rOperand`
BPatch_and	Return `lOperand && rOperand` (Boolean `and`)
BPatch_or	Return `lOperand \|\| rOperand` (Boolean `or`)

The type of the returned snippet is boolean, and the operands are type checked.

class BPatch_ifExpr(const BPatch_boolExpr &conditional, const BPatch_snippet &tClause, const BPatch_snippet &fClause) class BPatch_ifExpr(const BPatch_boolExpr &conditional, const BPatch_snippet &tClause)

This constructor creates an if statement. The first argument, conditional, should be a Boolean expression that is will be evaluated to decide which clause should be executed. The second argument, tClause, is the snippet to execute if the conditional evaluates to true. The third argument, fClause, is the snippet to execute if the conditional evaluates to false. This third argument is optional. Else-if statements, can be constructed by making the fClause of an if statement another if statement.

BPatch_constExpr(int value) BPatch_constExpr(float value) BPatch_constExpr(const String &value)

Define a constant snippet of the appropriate type.

BPatch_arithExpr(BPatch_binOp op, const BPatch_snippet &lOperand, const BPatch_snippet &rOperand)

Perform the required binary operation. The available binary operators are:

Operator	Description
BPatch_assign	Assign the value of `rOperand` to `lOperand`
BPatch_plus	Add `lOperand` and `rOperand`
BPatch_minus	Subtract `rOperand` from `lOperand`
BPatch_divide	Divide `rOperand` by `lOperand`
BPatch_times	Multiply `rOperand` by `lOperand`
BPatch_mod	Compute the remainder of dividing `rOperand` into `lOperand`
BPatch_ref	Array reference of the form `lOperand[rOperand]`
BPatch_seq	Define a sequence of two expressions (similar to comma in C)

Should we add min, max, and mean Ruth Aydt suggested this. jkh 8/10/95)

BPatch_arithExpr(BPatch_unOp, const BPatch_snippet &operand)

Define a snippet consisting of a unary operator. There available unary operators are BPatch_negate, and BPatch_address. BPatch_negate takes an integer snippet and returns the negation of the the snippet. BPatch_address takes a variable reference snippet and returns a pointer to it. This is equivalent to the C operator (&) and is useful for call-by-reference parameters.

BPatch_gotoExpr(const BPatch_gotoExpr &target)

Branch to the passed snippet.

nullExpr()

Define a null snippet. This snippet contains no executable statements; however it is a useful place holder for the destination of a goto.

4.4 class BPatch_Vector

BPatch_Vector is the primary container class used by the API. It is styled after the Standard Template Library (STL) Vector container class. At the time of the writing of this document, STL has been adopted as part of the ANSI C++ standardization, but implementations were not widely available. As a result, the initial version of the API uses its own compatible subset of the Vector class. In any implementation of the API will have (at least) the following member functions available.

BPatch_Vector();: Create a new empty vector.
int size();: Return the number of elements in the container instance.
void push_back(const T& x);: Add x to the end of the Vector.
const T& operator[](int n) const;: Return the nth element of the Vector.

5. Other Examples

In this section we show a complete program to demonstrate the use of the API. The example is a program called "re-pipe", it takes two arguments a process id and a file name and changes the output file descriptor for the specified process to the be the named file. The motivation for the example program is that you run a program, and it starts to print copious lines of output to your screen, and you wish to direct that output to a file without having to re-run the program.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include "BPatch.h"

main(int argc, char *argv[])
{
    int pid;
    BPatch_thread *tr;
    BPatch_image appImage;
    BPatch_Vector<BPatch_block&> args, dupArgs;
    BPatch_Vector<BPatch_point&> openFunc;
    BPatch_Vector<BPatch_point&> dup2Func;
    BPatch_Vector<BPatch_block&> codeBlock;

    // check for the correct arguments
    if (argc != 2) {
        printf("usage: repipe <pid> <file name>\\n");
    }

    // Get the process ID from the command line and create a BPatch_thread
    // instance for that process.
    pid = atoi(argv[1]);
    tr = new BPatch_thread(pid);
    if (!tr) exit(-1);

    // Open the application image (binary) and return a handle to it.
    appImage = tr->getImage();

    // The next part of the program generates the following code snippet:
    //
    //     {    
    //         int tempFd;
    //        
    //          tempFd = open(argv[2], O_WRONLY, O_CREAT);
    //          if (tempFd >= 0) {
    //              (void) dup2(0, tempFd);
    //          }
    //     }
    //
    //     NOTE: argv[2] refers to the second argument to the mutator not
    //           the application.

    // Create the code to open the new file.
    // Open(argv[2], O_WRONLY, O_CREAT)
    openFunc = appImage.findProcedurePoint("open", BPatch_entry);
    if (openFunc.count() != 1) {
        fprintf(stderr, "unable to find function open\\n");
        exit(-1);
    }
    args.push_back(BPatch_constExpr(argv[2]));    
    args.push_back(BPatch_constExpr(O_WRONLY));    
    args.push_back(BPatch_constExpr(O_CREAT));
    BPatch_funcCallExpr openCall(openFunc[0], args);

    // generate assignment statement to a temp variable
    //         tempFd = open(...)
    BPatch_variableExpr &tempFd = tr->malloc(appImage.findType("int"));
    BPatch_arithExpr assgn(BPatch_assign, tempFd, openCall);

    // dup2(0, tempFd)
    dup2Func = appImage.findProcedurePoint("dup2", BPatch_entry);
    if (dup2Func.count() != 1) {
        fprintf(stderr, "unable to find procedure dup2\\n");
        exit(-1);
    }
    dupArgs.push_back(BPatch_variableExpr(tempFd));
    BPatch_funcCallExpr dup2Call(dup2Func[0], dupArgs);

    // Generate if to test return statement
    //   if (tempFd >= 0) dup2(...)
    BPatch_boolExpr compareExpr(BPatch_ge, tempFd, BPatch_constExpr(0));
    BPatch_ifExpr ifExpr(compareExpr, dup2Call);

    // build statement list of open and if.
    codeBlock.push_back(openCall);
    codeBlock.push_back(ifExpr);
    BPatch_sequence block(codeBlock);

    // now arrange for the code to be executed.
    tr->stopExecution();
    tr->oneShotCode(block);
    tr->continueExecution();

    // Code to cleanup ommited.
}

6. References

Jeffrey K. Hollingsworth and Barton P. Miller, "An Adaptive Cost System for Parallel Programs", Euro-Par `96, Lyon, France, August 1996.
Jeffrey K. Hollingsworth, Barton P. Miller, and Jon Cargille, "Dynamic Program Instrumentation for Scalable Performance Tools", 1994 Scalable High-Performance Computing Conf., Knoxville, Tenn., 1994.
James R. Larus and Eric S. Snarr, "EEL: Machine-Independent Executable Editing", SIGPLAN Conference on Programming Language Design and Implementation, June 1995.
Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin, Karen L. Karavanic, Krishna Kunchithapadam, and Tia Newhall, "The Paradyn Parallel Performance Measurement Tools", IEEE Computer 28 11, (November 1995).

Dynamic Instrumentation API (proposed) Revision 0.1