1 .HTML "Acid: A Debugger Built From A Language
3 Acid: A Debugger Built From A Language
6 philw@plan9.bell-labs.com
11 Proc. of the Winter 1994 USENIX Conf.,
16 Acid is an unusual source-level symbolic debugger for Plan 9. It is implemented
17 as a language interpreter with specialized primitives that provide
18 debugger support. Programs written in the language manipulate
19 one or more target processes; variables in the language represent the
20 symbols, state, and resources of those processes.
21 This structure allows complex
22 interaction between the debugger and the target program and
23 provides a convenient method of parameterizing differences between
24 machine architectures.
25 Although some effort is required to learn
26 the debugging language, the richness and flexibility of the
27 debugging environment encourages new ways of reasoning about the way
28 programs run and the conditions under which they fail.
33 The size and complexity
34 of programs have increased in proportion to processor speed and memory but
35 the interface between debugger and programmer has changed little.
36 Graphical user interfaces have eased some of the tedious
37 aspects of the interaction. A graphical interface is a convenient
38 means for navigating through source and data structures but provides
39 little benefit for process control.
40 The introduction of a new concurrent language, Alef [Win93], emphasized the
41 inadequacies of the existing Plan 9 [Pike90] debugger
45 and made it clear that a new debugger was required.
47 Current debuggers like
52 are limited to answering only the questions their authors
53 envisage. As a result, they supply a plethora
54 of specialized commands, each attempting to anticipate
55 a specific question a user may ask.
56 When a debugging situation arises that is beyond the scope
57 of the command set, the tool is useless.
59 it is often tedious or impossible to reproduce an anomalous state
60 of the program, especially when
61 the state is embedded in the program's data structures.
63 Acid applies some ideas found in CAD software used for
64 hardware test and simulation.
65 It is based on the notion that the state and resources of a program
66 are best represented and manipulated by a language. The state and resources,
67 such as memory, registers, variables, type information and source code
68 are represented by variables in the language.
69 Expressions provide a computation mechanism and control
70 statements allow repetitive or selective interpretation based
71 on the result of expression evaluation.
72 The heart of the Acid debugger is an interpreter for a small typeless
73 language whose operators mirror the operations
74 of C and Alef, which in turn correspond well to the basic operations of
75 the machine. The interpreter itself knows nothing of the underlying
76 hardware; it deals with the program state and resources
78 Fundamental routines to control
79 processes, read files, and interface to the system are implemented
80 as builtin functions available to the interpreter.
81 The actual debugger functionality is coded
82 in Acid; commands are implemented as Acid functions.
84 This language-based approach has several advantages.
85 Most importantly, programs written in Acid, including most of the
86 debugger itself, are inherently portable.
87 Furthermore, Acid avoids the limitations other debuggers impose when
88 debugging parallel programs. Instead of embedding a fixed
89 process model in the debugger, Acid allows the
90 programmer to adapt the debugger to handle an
91 arbitrary process partitioning or program structure.
93 interact dynamically with an executing process provides clear advantages
94 over debuggers constrained to probe a static image.
95 Finally, the Acid language is a powerful vehicle for expressing
96 assertions about logic, process state, and the contents of data structures.
97 When combined with dynamic interaction it allows a
98 limited form of automated program verification without requiring
99 modification or recompilation of the source code.
100 The language is also an
101 excellent vehicle for preserving a test suite for later regression testing.
103 The debugger may be customized by its users; standard
104 functions may be modified or extended to suit a particular application
106 For example, the kernel developers in our group require a
107 command set supporting assembler-level debugging while the application
108 programmers prefer source-level functionality.
109 Although the default library is biased toward assembler-level debugging,
110 it is easily modified to provide a convenient source-level interface.
111 The debugger itself does not change; the user combines primitives
112 and existing Acid functions in different ways to
113 implement the desired interface.
117 DUEL [Gol93], an extension to
119 [Stal91], proposes using a high level expression evaluator to solve
120 some of these problems. The evaluator provides iterators to loop over data
121 structures and conditionals to control evaluation of expressions.
122 The author shows that complex state queries can be formulated
123 by combining concise expressions but this only addresses part of the problem.
124 A program is a dynamic entity; questions asked when the program is in
125 a static state are meaningful only after the program has been `caught' in
126 that state. The framework for manipulating the program is still as
127 primitive as the underlying debugger. While DUEL provides a means to
128 probe data structures it entirely neglects the most beneficial aspect
129 of debugging languages: the ability to control processes. Acid is structured
130 around a thread of control that passes between the interpreter and the
133 The NeD debugger [May92] is a set of extensions to TCL [Ous90] that provide
134 debugging primitives. The resulting language, NeDtcl, is used to implement
135 a portable interface between a conventional debugger, pdb [May90], and
136 a server that executes NeDtcl programs operating on the target program.
137 Execution of the NeDtcl programs implements the debugging primitives
139 NeD is targeted at multi-process debugging across a network,
140 and proves the flexibility of a language as a means of
141 communication between debugging tools. Whereas NeD provides an interface
142 between a conventional debugger and the process it debugs, Acid is the
143 debugger itself. While NeD has some of the ideas
144 found in Acid it is targeted toward a different purpose. Acid seeks to
145 integrate the manipulation of a program's resources into the debugger
146 while NeD provides a flexible interconnect between components of
147 the debugging environment. The choice of TCL is appropriate for its use
148 in NeD but is not suitable for Acid. Acid relies on the coupling of the type
149 system with expression evaluation, which are the root of its design,
150 to provide the debugging primitives.
152 Dalek [Ols90] is an event based language extension to gdb. State transitions
153 in the target program cause events to be queued for processing by the
156 Acid has many of the advantages of same process or
159 debuggers, like Parasight [Aral], without the need for dynamic linking or
161 Acid improves on the ideas of these other systems by completely integrating
162 all aspects of the debugging process into the language environment. Of
163 particular importance is the relationship between Acid variables,
164 program symbols, source code, registers and type information. This
165 integration is made possible by the design of the Acid language.
167 Interpreted languages such as Lisp and Smalltalk are able to provide
168 richer debugging environments through more complete information than
169 their compiled counterparts. Acid is a means to gather and represent
170 similar information about compiled programs through cooperation
171 with the compilation tools and library implementers.
175 Acid is a small interpreted language targeted to its debugging task.
176 It focuses on representing program state and addressing data rather than
177 expressing complex computations. Program state is
179 from an Acid program.
180 In addition to parsing and executing expressions and providing
181 an architecture-independent interface to the target process,
182 the interpreter supplies a mark-and-scan garbage collector
185 Every Acid session begins with the loading of the Acid libraries.
186 These libraries contain functions, written in Acid, that provide
187 a standard debugging environment including breakpoint management,
188 stepping by instruction or statement, stack tracing, and
189 access to variables, memory, and registers.
190 The library contains 600 lines of Acid code and provides
191 functionality similar to
193 Following the loading of the system library, Acid loads
194 user-specified libraries; this load sequence allows the
195 user to augment or override the standard commands
196 to customize the debugging environment. When all libraries
197 are loaded, Acid issues an interactive prompt and begins
198 evaluating expressions entered by the user. The Acid `commands'
199 are actually invocations of builtin primitives or previously defined
200 Acid functions. Acid evaluates each expression as it is entered and
205 Acid variables are of four basic types:
211 The type of a variable is inferred by the type of the right-hand side of
212 an assignment expression.
213 Many of the operators can be applied to more than
214 one type; for these operators the action of the operator is determined
215 by the type of its operands.
223 operands, and concatenates
228 Lists are the only complex type in Acid; there are no arrays, structures
229 or pointers. Operators provide
236 Lists can also be indexed like arrays.
238 Acid has two levels of scope: global and local.
239 Function parameters and variables declared in a function body
242 keyword are created at entry to the function and
243 exist for the lifetime of a function.
244 Global variables are created by assignment and need not be declared.
245 All variables and functions in the program
246 being debugged are entered in the Acid symbol table as global
247 variables during Acid initialization.
248 Conflicting variable names are resolved by prefixing enough `$' characters
250 Syntactically, Acid variables and target program
251 symbols are referenced identically.
252 However, the variables are managed differently in the Acid
253 symbol table and the user must be aware of this distinction.
254 The value of an Acid variable is stored in the symbol
255 table; a reference returns the value.
256 The symbol table entry for a variable or function in the target
257 program contains the address of that symbol in the image
258 of the program. Thus, the value of a program variable is
259 accessed by indirect reference through the Acid
260 variable that has the same name; the value of an Acid variable is the
261 address of the corresponding program variable.
269 statements implement looping.
271 is similar to the same statement in C.
272 The latter evaluates starting and ending expressions yielding
273 integers and iterates while an incrementing loop index
274 is within the bounds of those expressions.
276 acid: i = 0; loop 1,5 do print(i=i+1)
286 statement implements conditional execution.
290 Two indirection operators allow Acid to access values in
291 the program being debugged.
294 operator fetches a value from the memory image of an
298 operator fetches a value from the text file of the process.
299 When either operator appears on the left side of an assignment, the value
300 is written rather than read.
302 The indirection operator must know the size of the object
303 referenced by a variable.
304 The Plan 9 compilers neglect to include this
305 information in the program symbol table, so Acid cannot
306 derive this information implicitly.
307 Instead Acid variables have formats.
309 letter specifying the printing style and the effect of some of the
310 operators on that variable.
311 The indirection operators look at the format code to determine the
312 number of bytes to read or write.
313 The format codes are derived from the format letters used by
315 By default, symbol table variables and numeric constants
316 are assigned the format code
318 which specifies 32-bit hexadecimal.
319 Printing such a variable yields output of the form
321 An indirect reference through the variable fetches 32 bits
322 of data at the address indicated by the variable.
323 Other formats specify various data types, for example
327 a signed 32 bit decimal,
329 a null-terminated string.
333 allows the user to change the format code of a variable
334 to control the printing format and
335 operator side effects.
336 This function evaluates the expression supplied as the first
337 argument, attaches the format code supplied as the second
338 argument to the result and returns that value.
339 If the result is assigned to a variable,
340 the new format code applies to
341 that variable. For convenience, Acid provides the
343 operator as a shorthand infix form of
348 acid: x // print x in hex
350 acid: x = fmt(x, 'D') // make x type decimal
351 acid: print(x, fmt(x, 'X'), x\eX) // print x in decimal & hex
352 10 0x0000000a 0x0000000a
353 acid: x // print x in decimal
355 acid: x\eo // print x in octal
362 operators increment or decrement a variable by an amount
363 determined by its format code. Some formats imply a non-fixed size.
366 format code disassembles an instruction into a string.
367 On a 68020, which has variable length instructions:
369 acid: p=main\ei // p=addr(main), type INST
370 acid: loop 1,5 do print(p\eX, @p++) // disassemble 5 instr's
371 0x0000222e LEA 0xffffe948(A7),A7
372 0x00002232 MOVL s+0x4(A7),A2
373 0x00002236 PEA 0x2f($0)
374 0x0000223a MOVL A2,-(A7)
375 0x0000223c BSR utfrrune
380 is the address of the function of the same name in the program under test.
381 The loop retrieves the five instructions beginning at that address and
382 then prints the address and the assembly language representation of each.
383 Notice that the stride of the increment operator varies with the size of
388 is a two byte instruction while all others are four bytes long.
390 Registers are treated as normal program variables referenced
391 by their symbolic assembler language names.
393 process stops, the register set is saved by the kernel
394 at a known virtual address in the process memory map.
395 The Acid variables associated with the registers point
396 to the saved values and the
398 indirection operator can then be used to read and write the register set.
399 Since the registers are accessed via Acid variables they may
400 be used in arbitrary expressions.
402 acid: PC // addr of saved PC
405 0x0000623c // contents of PC
408 acid: *R1=10 // modify R1
409 acid: asm(*PC+4) // disassemble @ PC+4
410 main+0x4 0x00006240 MOVW R31,0x0(R29)
411 main+0x8 0x00006244 MOVW $setR30(SB),R30
412 main+0x10 0x0000624c MOVW R1,_clock(SB)
418 its current content is
422 format code converts this value to a string specifying
423 the address as an offset beyond the nearest symbol.
424 After setting the value of register
428 command to disassemble a short section of code beginning
429 at four bytes beyond the current value of the
434 A program executing under Acid is monitored through the
436 file system interface provided by Plan 9.
437 Textual messages written to the
439 file control the execution of the process.
442 to the control file causes the write to block until the target
443 process enters the kernel and is stopped. When the process is stopped
444 the write completes. The
446 message starts the target process and then does a
449 Synchronization between the debugger and the target process is determined
450 by the actions of the various messages. Some operate asynchronously to the
451 target process and always complete immediately, others block until the
452 action completes. The asynchronous messages allow Acid to control
453 several processes simultaneously.
455 The interpreter has builtin functions named after each of the control
456 messages. The functions take a process id as argument.
457 Any time a control message causes the program to execute instructions
458 the interpreter performs two actions when the control operation has completed.
459 The Acid variables pointing at the register set are fixed up to point
460 at the saved registers, and then
461 the user defined function
466 function may print the current address,
467 line of source or instruction and return to interactive mode. Alternatively
468 it may traverse a complex data structure, gather statistics and then set
469 the program running again.
471 Several Acid variables are maintained by the debugger rather than the
473 These variables allow generic Acid code to deal with the current process,
474 architecture specifics or the symbol table.
477 is the process id of the current process Acid is debugging.
480 contains a list of lists where each sublist contains the symbol
481 name, its type and the value of the symbol.
484 contains a list of the machine-specific register names. Global symbols in the target program
485 can be referenced directly by name from Acid. Local variables
486 are referenced using the colon operator as \f(CWfunction:variable\fP.
488 Source Level Debugging
490 Acid provides several builtin functions to manipulate source code.
493 function reads a text file, inserting each line into a list.
498 functions each take an address as an argument.
500 returns a string containing the name of the source file
501 and the second returns an integer containing the line number
502 of the source line containing the instruction at the address.
504 acid: pcfile(main) // file containing main
506 acid: pcline(main) // line # of main in source
508 acid: file(pcfile(main))[pcline(main)] // print that line
509 main(int argc, char *argv[])
510 acid: src(*PC) // print statements nearby
513 >11 main(int argc, char *argv[])
517 In this example, the three primitives are combined in an expression to print
518 a line of source code associated with an address.
521 function prints a few lines of source
522 around the address supplied as its argument. A companion routine,
524 communicates with the external editor
526 Given an address, it loads the corresponding source file into the editor
527 and highlights the line containing the address. This simple interface
528 is easily extended to more complex functions.
531 function can select the current file and line in the editor
532 each time the target program stops, giving the user a visual
533 trace of the execution path of the program. A more complete interface
534 allowing two way communication between Acid and the
536 user interface [Pike93] is under construction. A filter between the debugger
537 and the user interface provides interpretation of results from both
538 sides of the interface. This allows the programming environment to
539 interact with the debugger and vice-versa, a capability missing from the
546 functions are both written in Acid code using the file and line primitives.
547 Acid provides library functions to step through source level
548 statements and functions. Furthermore, addresses in Acid expressions can be
549 specified by source file and line.
550 Source code is manipulated in the Acid
556 The following examples define some useful commands and
557 illustrate the interaction of the debugger and the interpreter.
559 defn bpset(addr) // set breakpoint
561 if match(addr, bplist) >= 0 then
562 print("bkpoint already set:", addr\ea, "\en");
564 *fmt(addr, bpfmt) = bpinst; // plant it
565 bplist = append bplist, addr; // add to list
571 function plants a break point in memory. The function starts by
575 search the breakpoint list to determine if a breakpoint is already
577 The indirection operator, controlled by the format code returned
580 primitive, is used to plant the breakpoint in memory.
585 are Acid global variables containing the format code specifying
586 the size of the breakpoint instruction and the breakpoint instruction
589 variables are set by architecture-dependent library code
590 when the debugger first attaches to the executing image.
591 Finally the address of the breakpoint is
592 appended to the breakpoint list,
595 defn step() // single step
597 local lst, lpl, addr, bput;
599 bput = 0; // sitting on bkpoint
600 if match(*PC, bplist) >= 0 then {
601 bput = fmt(*PC, bpfmt); // save current addr
602 *bput = @bput; // replace it
605 lst = follow(*PC); // get follow set
608 while lpl do { // place breakpoints
609 *(head lpl) = bpinst;
613 startstop(pid); // do the step
615 while lst do { // remove breakpoints
616 addr = fmt(head lst, bpfmt);
617 *addr = @addr; // replace instr.
621 *bput = bpinst; // restore breakpoint
626 function executes a single assembler instruction.
630 on a breakpoint, the address and size of
631 the breakpoint are saved.
632 The breakpoint instruction
633 is then removed using the
637 bytes from the text file and to place it into the memory
638 of the executing process using the
644 builtin which returns a follow-set: a list of instruction addresses which
645 could be executed next.
646 If the instruction stored at the
648 is a branch instruction, the
649 list contains the addresses of the next instruction and
650 the branch destination; otherwise, it contains only the
651 address of the next instruction.
652 The follow-set is then used to replace each possible following
653 instruction with a breakpoint instruction. The original
654 instructions need not be saved; they remain
655 in their unaltered state in the text file.
658 builtin writes the `startstop' message to the
660 control file for the process named
662 The target process executes until some condition causes it to
663 enter the kernel, in this case, the execution of a breakpoint.
664 When the process blocks, the debugger regains control and invokes the
665 Acid library function
667 which reports the address and cause of the blockage.
670 function completes and returns to the
673 the follow-set is used to replace the breakpoints placed earlier.
674 Finally, if the address of the original
676 contained a breakpoint, it is replaced.
678 Notice that this approach to process control is inherently portable;
679 the Acid code is shared by the debuggers for all architectures.
680 Acid variables and builtin functions provide a transparent interface
681 to architecture-dependent values and functions. Here the breakpoint
682 value and format are referenced through Acid variables and the
684 primitive masks the differences in the underlying instruction set.
688 function, similar to the
690 command of the same name,
691 is a simpler example.
692 This function steps through
693 a single source statement but steps over function calls.
699 sp = *SP; // save starting SP
700 bound = fnbound(*PC); // begin & end of fn.
701 stmnt(); // step 1 statement
703 if pc >= bound[0] && pc < bound[1] then
706 while (pc<bound[0] || pc>bound[1]) && sp>=*SP do {
716 starts by saving the current stack pointer in a local variable.
717 It then uses the Acid library function
719 to return the addresses of the first and last instructions in
720 the current function in a list.
723 function executes a single source statement and then uses
725 to print a few lines of source around the new
727 If the new value of the
729 remains in the current function,
732 When the executed statement is a function call or a return
733 from a function, the new value of the
735 is outside the bounds calculated by
740 If the statement was a return, the new value of the stack pointer
741 is greater than the original value and the loop completes without
743 Otherwise, the loop is entered and instructions are continually
744 executed until the value of the
746 is between the bounds calculated earlier. At that point, execution
747 ceases and a few lines of source in the vicinity of the
751 Acid provides concise and elegant expression for control and
752 manipulation of target programs. These examples demonstrate how a
753 few well-chosen primitives can be combined to create a rich debugging environment.
755 Dealing With Multiple Architectures
757 A single binary of Acid may be used to debug a program running on any
758 of the five processor architectures supported by Plan 9. For example,
759 Plan 9 allows a user on a MIPS to import the
761 file system from an i486-based PC and remotely debug a program executing
764 Two levels of abstraction provide this architecture independence.
765 On the lowest level, a Plan 9 library supplies functions to
766 decode the file header of the program being debugged and
767 select a table of system parameters
768 and a jump vector of architecture-dependent
769 functions based on the magic number.
770 Among these functions are byte-order-independent
771 access to memory and text files, stack manipulation, disassembly,
772 and floating point number interpretation.
773 The second level of abstraction is supplied by Acid.
774 It consists of primitives and approximately 200 lines
775 of architecture-dependent Acid library code that interface the
776 interpreter to the architecture-dependent library.
777 This layer performs functions such as mapping register names to
778 memory locations, supplying breakpoint values and sizes,
779 and converting processor specific data to Acid data types.
780 An example of the latter is the stack trace function
782 which uses the stack traversal functions in the
783 architecture-dependent library to construct a list of lists describing
784 the context of a process. The first level of list selects
785 each function in the trace; subordinate lists contain the
786 names and values of parameters and local variables of
787 the functions. Acid commands and library functions that
788 manipulate and display process state information operate
789 on the list representation and are independent of the
790 underlying architecture.
794 Alef is a concurrent programming language,
795 designed specifically for systems programming, which supports both
796 shared variable and message passing paradigms.
797 Alef borrows the C expression syntax but implements
798 a substantially different type system.
799 The language provides a rich set of
800 exception handling, process management, and synchronization
801 primitives, which rely on a runtime system.
802 Alef program bugs are often deadlocks, synchronization failures,
803 or non-termination caused by locks being held incorrectly.
804 In such cases, a process stalls deep
805 in the runtime code and it is clearly
806 unreasonable to expect a programmer using the language
807 to understand the detailed
808 internal semantics of the runtime support functions.
810 Instead, there is an Alef support library, coded in Acid, that
811 allows the programmer to interpret the program state in terms of
812 Alef operations. Consider the example of a multi-process program
813 stalling because of improper synchronization. A stack trace of
814 the program indicates that it is waiting for an event in some
816 synchronization function.
817 The function itself is irrelevant to the
818 programmer; of greater importance is the identity of the
820 Commands in the Alef support library decode
821 the runtime data structures and program state to report the cause
822 of the blockage in terms of the high-level operations available to
824 Here, the Acid language acts
825 as a communications medium between Alef implementer and Alef user.
829 The central issue in parallel debugging is how the debugger is
830 multiplexed between the processes comprising
832 Acid has no intrinsic model of process partitioning; it
833 only assumes that parallel programs share a symbol table,
834 though they need not share memory.
837 primitive attaches the debugger to a running process
838 associated with the process ID supplied as its argument
839 and assigns that value to the global variable
841 thereby allowing simple rotation among a group of processes.
842 Further, the stack trace primitive is driven by parameters
843 specifying a unique process context, so it is possible to
844 examine the state of cooperating processes without switching
845 the debugger focus from the process of interest.
846 Since Acid is inherently extensible and capable of
847 dynamic interaction with subordinate processes, the
848 programmer can define Acid commands to detect and control
849 complex interactions between processes.
850 In short, the programmer is free to specify how the debugger reacts
851 to events generated in specific threads of the program.
853 The support for parallel debugging in Acid depends on a crucial kernel
854 modification: when the text segment of a program is written (usually to
855 place a breakpoint), the segment is cloned to prevent other threads
856 from encountering the breakpoint. Although this incurs a slight performance
857 penalty, it is of little importance while debugging.
859 Communication Between Tools
861 The Plan 9 Alef and C compilers do not
862 embed detailed type information in the symbol table of an
864 However, they do accept a command line option causing them to
865 emit descriptions of complex data types
866 (e.g., aggregates and abstract data types)
867 to an auxiliary file.
868 The vehicle for expressing this information is Acid source code.
869 When an Acid debugging session is
870 subsequently started, that file is loaded with the other Acid libraries.
872 For each complex object in the program the compiler generates
873 three pieces of Acid code.
874 The first is a table describing the size and offset of each
875 member of the complex data type. Following is an Acid function,
876 named the same as the object, that formats and prints each member.
877 Finally, Acid declarations associate the
878 Alef or C program variables of a type with the functions
880 The three forms of declaration are shown in the following example:
894 print("Rectangle r {\en");
897 print("Rectangle clipr {\en");
898 Rectangle(addr.clipr);
900 print(" ldepth ", addr.ldepth, "\en");
901 print(" id ", addr.id, "\en");
902 print(" cache ", addr.cache, "\en");
905 complex Bitmap darkgrey;
906 complex Bitmap Window_settag:b;
910 declaration specifies decoding instructions for the complex type named
912 Although the syntax is superficially similar to a C structure declaration,
913 the semantics differ markedly: the C declaration specifies a layout, while
914 the Acid declaration tells how to decode it.
915 The declaration specifies a type, an offset, and name for each
916 member of the complex object. The type is either the name of another
917 complex declaration, for example,
920 The offset is the number of bytes from the start
921 of the object to the member
922 and the name is the member's name in the Alef or C declaration.
923 This type description is a close match for C and Alef, but is simple enough
924 to be language independent.
928 function expects the address of a
930 as its only argument.
931 It uses the decoding information contained in the
933 structure declaration to extract, format, and print the
934 value of each member of the complex object pointed to by
936 The Alef compiler emits code to call other Acid functions
937 where a member is another complex type; here,
941 to print its contents.
945 declarations associate Alef variables with complex types.
948 is the name of a global variable of type
950 in the program being debugged.
953 is evaluated by Acid, it automatically calls the
955 function with the address of
960 declaration associates a local variable or parameter named
968 Acid borrows the C operators
972 to access the decoding parameters of a member of a complex type.
973 Although this representation is sufficiently general for describing
974 the decoding of both C and Alef complex data types, it may
975 prove too restrictive for target languages with more complicated
977 Further, the assumption that the compiler can select the proper
978 Acid format code for each basic type in the language is somewhat
979 naive. For example, when a member of a complex type is a pointer,
980 it is assigned a hexadecimal type code; integer members are always
981 assigned a decimal type code.
982 This heuristic proves inaccurate when an integer field is a
983 bit mask or set of bit flags which are more appropriately displayed
984 in hexadecimal or octal.
988 Acid's ability to interact dynamically with
989 an executing program allows passive test and
990 verification of the target program. For example,
991 a common concern is leak detection in programs using
993 Of interest are two items: finding memory that was allocated
994 but never freed and detecting bad pointers passed to
996 An auxiliary Acid library contains Acid functions to
997 monitor the execution of a program and detect these
998 faults, either as they happen or in the automated
999 post-mortem analysis of the memory arena.
1000 In the following example, the
1002 command is run under the control of the
1003 Acid memory leak library.
1005 helix% acid -l malloc /bin/sort
1006 /bin/sort: mips plan 9 executable
1020 27680 : breakpoint _exits+0x4 MOVW $0x8,R1
1025 command creates a process and plants
1026 breakpoints at the entry to
1030 The program is then started and continues until it
1031 exits or stops. If the reason for stopping is anything
1032 other than the breakpoints in
1036 Acid prints the usual status information and returns to the
1039 When the process stops on entering
1041 the debugger must capture and save the address that
1044 After saving a stack
1045 trace so the calling routine can be identified, it places
1046 a breakpoint at the return address and restarts the program.
1049 returns, the breakpoint stops the program,
1050 allowing the debugger
1051 to grab the address of the new memory block from the return register.
1052 The address and stack trace are added to the list of outstanding
1053 memory blocks, the breakpoint is removed from the return point, and
1054 the process is restarted.
1056 When the process stops at the beginning of
1058 the memory address supplied as the argument is compared to the list
1059 of outstanding memory blocks. If it is not found an error message
1060 and a stack trace of the call is reported; otherwise, the
1061 address is deleted from the list.
1063 When the program exits, the list of outstanding memory blocks contains
1064 the addresses of all blocks that were allocated but never freed.
1067 library function traverses the list producing a report describing
1068 the allocated blocks.
1071 Lost a total of 524288 bytes from:
1072 malloc() malloc.c:32 called from dofile+0xe8 sort.c:217
1073 dofile() sort.c:190 called from main+0xac sort.c:161
1074 main() sort.c:128 called from _main+0x20 main9.s:10
1075 Lost a total of 64 bytes from:
1076 malloc() malloc.c:32 called from newline+0xfc sort.c:280
1077 newline() sort.c:248 called from dofile+0x110 sort.c:222
1078 dofile() sort.c:190 called from main+0xac sort.c:161
1079 main() sort.c:128 called from _main+0x20 main9.s:10
1080 Lost a total of 64 bytes from:
1081 malloc() malloc.c:32 called from realloc+0x14 malloc.c:129
1082 realloc() malloc.c:123 called from bldkey+0x358 sort.c:1388
1083 buildkey() sort.c:1345 called from newline+0x150 sort.c:285
1084 newline() sort.c:248 called from dofile+0x110 sort.c:222
1085 dofile() sort.c:190 called from main+0xac sort.c:161
1086 main() sort.c:128 called from _main+0x20 main9.s:10
1088 data...bss...stack...
1092 The presence of a block in the allocation list does not imply
1093 it is there because of a leak; for instance, it may have been
1094 in use when the program terminated.
1097 library function scans the
1102 segments of the process looking for pointers
1103 into the allocated blocks. When one is found, the block is deleted from
1104 the outstanding block list.
1107 function is used again to report the
1108 blocks remaining allocated and unreferenced.
1109 This strategy proves effective in detecting
1110 disconnected (but non-circular) data structures.
1112 The leak detection process is entirely passive.
1114 specially compiled and the source code is not required.
1115 As with the Acid support functions for the Alef runtime environment,
1116 the author of the library routines has encapsulated the
1117 functionality of the library interface
1119 Any programmer may then check a program's use of the
1120 library routines without knowledge of either implementation.
1121 The performance impact of running leak detection is great
1122 (about 10 times slower),
1123 but it has not prevented interactive programs like
1127 window system from being tested.
1131 Another common component of software test uses
1134 The purpose of the test is to determine which paths through the code have
1135 not been executed while running the test suite.
1137 performed by a combination of compiler support and a reporting tool run
1138 on the output generated by statements compiled into the program.
1139 The compiler emits code that
1140 logs the progress of the program as it executes basic blocks and writes the
1141 results to a file. The file is then processed by the reporting tool
1142 to determine which basic blocks have not been executed.
1144 Acid can perform the same function in a language independent manner without
1145 modifying the source, object or binary of the program. The following example
1148 being run under the control of the Acid coverage library.
1150 philw-helix% acid -l coverage /bin/ls
1151 /bin/ls: mips plan 9 executable
1161 2: (error) msg: pid=11419 startstop: process exited
1166 104: if(db[0].qid.path&CHDIR && dflag==0){
1169 122: memmove(dirbuf+ndir, db, sizeof(Dir));
1170 123: dirbuf[ndir].prefix = 0;
1171 124: p = utfrrune(s, '/');
1173 126: dirbuf[ndir].prefix = s;
1177 function begins by looping through the text segment placing
1178 breakpoints at the entry to each basic block. The start of each basic
1179 block is found using the Acid builtin function
1181 If the list generated by
1183 contains more than one
1184 element, then the addresses mark the start of basic blocks. A breakpoint
1185 is placed at each address to detect entry into the block. If the result
1188 is a single address then no action is taken, and the next address is
1189 considered. Acid maintains a list of
1190 breakpoints already in place and avoids placing duplicates (an address may be
1191 the destination of several branches).
1193 After placing the breakpoints the program is set running.
1194 Each time a breakpoint is encountered
1195 Acid deletes the address from the breakpoint list, removes the breakpoint
1196 from memory and then restarts the program.
1197 At any instant the breakpoint list contains the addresses of basic blocks
1198 which have not been executed.
1201 function reports the lines of source code bounded by basic blocks
1202 whose addresses are have not been deleted from the breakpoint list.
1203 These are the basic blocks which have not been executed.
1204 Program performance is almost unaffected since each breakpoint is executed
1205 only once and then removed.
1207 The library contains a total of 128 lines of Acid code.
1208 An obvious extension of this algorithm could be used to provide basic block
1213 Acid has two areas of weakness. As with
1214 other language-based tools like
1216 a programmer must learn yet another language to step beyond the normal
1217 debugging functions and use the full power of the debugger.
1218 Second, the command line interface supplied by the
1220 parser is inordinately clumsy.
1221 Part of the problem relates directly to the use of
1223 and could be circumvented with a custom parser.
1224 However, structural problems would remain: Acid often requires
1225 too much typing to execute a simple
1227 A debugger should prostitute itself to its users, doing whatever
1228 is wanted with a minimum of encouragement; commands should be
1229 concise and obvious. The language interface is more consistent than
1230 an ad hoc command interface but is clumsy to use.
1231 Most of these problems are addressed by an Acme interface
1232 which is under construction. This should provide the best of
1233 both worlds: graphical debugging and access to the underlying acid
1234 language when required.
1236 The name space clash between Acid variables, keywords, program variables,
1237 and functions is unavoidable.
1238 Although it rarely affects a debugging session, it is annoying
1239 when it happens and is sometimes difficult to circumvent.
1240 The current renaming scheme
1241 is too crude; the new names are too hard to remember.
1243 Acid has proved to be a powerful tool whose applications
1244 have exceeded expectations.
1245 Of its strengths, portability, extensibility and parallel debugging support
1246 were by design and provide the expected utility.
1248 its use as a tool for code test and verification and as
1249 a medium for communicating type information and encapsulating
1250 interfaces has provided unanticipated benefits and altered our
1251 view of the debugging process.
1255 Bob Flandrena was the first user and helped prepare the paper.
1256 Rob Pike endured three buggy Alef compilers and a new debugger
1257 in a single sitting.
1261 [Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
1262 ``Plan 9 from Bell Labs'',
1264 UKUUG Proc. of the Summer 1990 Conf.,
1268 reprinted, in a different form, in this volume.
1270 [Gol93] M. Golan, D. Hanson,
1271 ``DUEL -- A Very High-Level Debugging Language'',
1273 USENIX Proc. of the Winter 1993 Conf.,
1278 [Lin90] M. A. Linton,
1279 ``The Evolution of DBX'',
1281 USENIX Proc. of the Summer 1990 Conf.,
1286 [Stal91] R. M. Stallman, R. H. Pesch,
1287 ``Using GDB: A guide to the GNU source level debugger'',
1288 Technical Report, Free Software Foundation,
1292 [Win93] P. Winterbottom,
1293 ``Alef reference Manual'',
1297 ``Acme: A User Interface for Programmers'',
1299 USENIX Proc. of the Winter 1994 Conf.,
1302 reprinted in this volume.
1304 [Ols90] Ronald A. Olsson, Richard H. Crawford, and W. Wilson Ho,
1305 ``Dalek: A GNU, improved programmable debugger'',
1307 USENIX Proc. of the Summer 1990 Conf.,
1311 [May92] Paul Maybee,
1312 ``NeD: The Network Extensible Debugger''
1314 USENIX Proc. of the Summer 1992 Conf.,
1318 [Aral] Ziya Aral, Ilya Gertner, and Greg Schaffer,
1319 ``Efficient debugging primitives for multiprocessors'',
1321 Proceedings of the Third International Conference on Architectural
1322 Support for Programming Languages and Operating Systems,
1324 SIGPLAN notices Nr. 22, May 1989.