1 .HTML "Adding Application Support for a New Architecture in Plan 9
3 Adding Application Support for a New Architecture in Plan 9
6 bobf@plan9.bell-labs.com
10 Plan 9 has five classes of architecture-dependent software:
11 headers, kernels, compilers and loaders, the
13 system library, and a few application programs. In general,
14 architecture-dependent programs
15 consist of a portable part shared by all architectures and a
16 processor-specific portion for each supported architecture.
17 The portable code is often compiled and stored in a library
19 each architecture. A program is built by
20 compiling the architecture-specific code and loading it with the
21 library. Support for a new architecture is provided
22 by building a compiler for the architecture, using it to
23 compile the portable code into libraries,
24 writing the architecture-specific code, and
25 then loading that code with
28 This document describes the organization of the architecture-dependent
29 code and headers on Plan 9.
30 The first section briefly discusses the layout of
31 the headers and the source code for the kernels, compilers, loaders, and the
34 The second section provides a detailed
35 discussion of the structure of
37 a library containing almost
38 all architecture-dependent code
39 used by application programs.
40 The final section describes the steps required to add
41 application program support for a new architecture.
45 Architecture-dependent information for the new processor
46 is stored in the directory tree rooted at \f(CW/\fP\fIm\fP
49 is the name of the new architecture (e.g.,
51 The new directory should be initialized with several important
52 subdirectories, notably
57 The directory tree of an existing architecture
58 serves as a good model for the new tree.
59 The architecture-dependent
61 must be stored in the newly created root directory
62 for the architecture. It is easiest to copy the
63 mkfile for an existing architecture and modify
64 it for the new architecture. When the mkfile
65 is correct, change the
70 .CW /sys/src/mkfile.proto
71 to reflect the addition of the new architecture.
75 Architecture-dependent headers are stored in directory
79 is the name of the architecture (e.g.,
81 Two header files are required:
85 The first defines fundamental data types,
86 bit settings for the floating point
87 status and control registers, and
89 processing which depends on the stack
90 model for the architecture. This file
91 is best built by copying and modifying the
93 file from an architecture
94 with a similar stack model.
98 contains a structure describing the layout
99 of the saved register set for
100 the architecture; it is defined by the kernel.
103 .CW /sys/include/a.out.h
104 contains the definitions of the magic
105 numbers used to identify executables for
106 each architecture. When support for a new
107 architecture is added, the magic number
108 for the architecture must be added to this file.
110 The header format of a bootable executable is defined by
111 each manufacturer. Header file
112 .CW /sys/include/bootexec.h
113 contains structures describing the headers currently
114 supported. If the new architecture uses a common header
116 the header format is probably already defined,
117 but if the bootable header format is non-standard,
118 a structure defining the format must be added to this file.
123 Although the kernel depends critically on the properties of the underlying
124 hardware, most of the
125 higher-level kernel functions, including process
126 management, paging, pseudo-devices, and some
127 networking code, are independent of processor
128 architecture. The portable kernel code
129 is divided into two parts: that implementing kernel
130 functions and that devoted to the boot process.
131 Code in the first class is stored in directory
133 and the portable boot code is stored in
134 .CW /sys/src/9/boot .
135 Architecture-dependent kernel code is stored in the
138 named for each architecture.
140 The relationship between the kernel code and the boot code
141 is convoluted and subtle. The portable boot code
142 is compiled into a library for each architecture. An architecture-specific
143 main program is loaded with the appropriate library and the resulting
144 executable is compiled into the kernel where it is executed as
145 a user process during the final stages of kernel initialization. The boot process
146 performs authentication, attaches the name space root to the appropriate
147 file system and starts the
151 The organization of the portable kernel source code differs from that
152 of most other architecture-specific code.
153 Instead of storing the portable code in a library
154 and loading it with the architecture-specific
155 code, the portable code is compiled directly into
156 the directory containing the architecture-specific code
157 and linked with the object files built from the source in that directory.
160 Compilers and Loaders
162 The compiler source code conforms to the usual
163 organization: portable code is compiled into a library
164 for each architecture
165 and the architecture-dependent code is loaded with
167 The common compiler code is stored in
168 .CW /sys/src/cmd/cc .
171 in this directory compiles the portable source and
172 archives the objects in a library for each architecture.
173 The architecture-specific compiler source
174 is stored in a subdirectory of
176 with the same name as the compiler (e.g.,
177 .CW /sys/src/cmd/vc ).
179 There is no portable code shared by the loaders.
180 Each directory of loader source
181 code is self-contained, except for
182 a header file and an instruction name table
184 directory of the associated
190 Most C library modules are
191 portable; the source code is stored in
193 .CW /sys/src/libc/port
195 .CW /sys/src/libc/9sys .
196 Architecture-dependent library code
197 is stored in the subdirectory of
199 named the same as the target processor.
200 Non-portable functions not only
201 implement architecture-dependent operations
202 but also supply assembly language implementations
203 of functions where speed is critical.
205 .CW /sys/src/libc/9syscall
206 is unusual because it
207 contains architecture-dependent information
208 for all architectures.
209 It holds only a header file defining
210 the names and numbers of system calls
217 script that parses the header file, constructs
218 assembler language functions implementing the system
219 call for each architecture, assembles the code,
220 and archives the object files in
222 The assembler language syntax and the system interface
223 differ for each architecture.
228 must be modified to support a new architecture.
233 Application programs process two forms of architecture-dependent
234 information: executable images and intermediate object files.
235 Almost all processing is on executable files.
238 provides functions that convert
239 architecture-specific data
240 to a portable format so application programs
241 can process this data independent of its
242 underlying representation.
243 Further, when a new architecture is implemented
244 almost all code changes
245 are confined to the library;
246 most affected application programs need only be reloaded.
247 The source code for the library is stored in
248 .CW /sys/src/libmach .
250 An application program running on one type of
251 processor must be able to interpret
252 architecture-dependent information for all
253 supported processors.
254 For example, a debugger must be able to debug
256 all architectures, not just the
257 architecture on which it is executing, since
259 may be imported from a different machine.
261 A small part of the application library
262 provides functions to
263 extract symbol references from object files.
264 The remainder provides the following processing
265 of executable files or memory images:
267 Header interpretation.
269 Symbol table interpretation.
271 Execution context interpretation, such as stack traces
272 and stack frame location.
274 Instruction interpretation including disassembly and
275 instruction size and follow-set calculations.
277 Exception and floating point number interpretation.
279 Architecture-independent read and write access through a
283 .CW /sys/include/mach.h
284 defines the interfaces to the
285 application library. Manual pages
290 describe the details of the
293 Two data structures, called
297 contain architecture-dependent parameters and
298 a jump table of functions.
307 data structures associated with the target architecture.
308 An application determines the target architecture of
309 a file or executable image, sets the global pointers
310 to the data structures associated with that architecture,
311 and subsequently performs all references indirectly through the
313 As a result, direct references to the tables for each
314 architecture are avoided and the application code intrinsically
315 supports all architectures (though only one at a time).
317 Object file processing is handled similarly: architecture-dependent
318 functions identify and
319 decode the intermediate files for the processor.
320 The application indirectly
321 invokes a classification function to identify
322 the architecture of the object code and to select the
323 appropriate decoding function. Subsequent calls
324 then use that function to decode each record. Again,
325 the layer of indirection allows the application code
326 to support all architectures without modification.
328 Splitting the architecture-dependent information
334 allows applications to choose
335 an appropriate level of service. Even though an application
336 does not directly reference the architecture-specific data structures,
338 architecture-dependent tables and code
339 for all architectures it supports. The size of this data
340 can be substantial and many applications do not require
341 the full range of architecture-dependent functionality.
344 command does not require the disassemblers for every architecture;
345 it only needs to decode the header.
348 data structure contains a few architecture-specific parameters
349 and a description of the processor register set.
350 The size of the structure
351 varies with the size of the register
352 set but is generally small.
355 data structure contains
356 a jump table of architecture-dependent functions;
357 the amount of code and data referenced by this table
360 Libmach Source Code Organization
364 library provides four classes of functionality:
366 .IP "Header and Symbol Table Decoding\ -\ "
371 contain code to interpret the header and
373 an executable file or executing image.
380 data structure, and points
385 data structure of the target architecture.
386 The symbol table processing
389 structure to decode the symbol table.
390 A variety of symbol table access functions then support
391 queries on the reformatted table.
392 .IP "Debugger Support\ -\ "
397 is the code letter assigned to the architecture,
398 contain the initialized
400 data structure and the definition of the register
401 set for each architecture.
402 Architecture-specific debugger support functions and
405 structure are stored in
412 contain debugger support functions shared
413 by multiple architectures.
414 .IP "Architecture-Independent Access\ -\ "
420 provide accesses through a relocation map
421 to data in an executable file or executing image.
422 Byte-swapping is performed as needed. Global variables
430 data structures of the target architecture.
431 .IP "Object File Interpretation\ -\ "
432 These files contain functions to identify the
433 target architecture of an
434 intermediate object file
435 and extract references to symbols. File
437 contains code common to all architectures;
440 contains the architecture-specific source code
441 for the machine with code character
446 data structure is primarily a jump
447 table of architecture-dependent debugger support
448 functions. Functions select the
450 structure for a target architecture based
455 structure or the name of the architecture.
456 The jump table provides functions to swap bytes, interpret
457 machine instructions,
459 traces, find stack frames, format floating point
460 numbers, and decode machine exceptions. Some functions, such as
461 machine exception decoding, are idiosyncratic and must be
462 supplied for each architecture. Others depend
463 on the compiler run-time model and several
464 architectures may share code common to a model. For
465 example, many architectures share the code to
466 process the fixed-frame stack model implemented by
467 several of the compilers.
469 functions, such as byte-swapping, provide a general capability and
470 the jump table need only select an implementation appropriate
474 Adding Application Support for a New Architecture
476 This section describes the
477 steps required to add application-level
478 support for a new architecture.
480 the kernel, compilers, loaders and system libraries
481 for the new architecture are already in place. This
482 implies that a code-character has been assigned and
483 that the architecture-specific headers have been
485 With the exception of two programs,
486 application-level changes are confined to header
487 files and the source code in
488 .CW /sys/src/libmach .
491 Begin by updating the application library
493 .CW /sys/include/mach.h .
494 Add the following symbolic codes to the
496 statement near the beginning of the file:
499 The processor type code, e.g.,
502 The type of the executable. There are usually
503 two codes needed: one for a bootable
504 executable (i.e., a kernel) and one for an
505 application executable.
507 The disassembler type code. Add one entry for
508 each supported disassembler for the architecture.
510 A symbolic code for the object file.
515 .CW /sys/src/libmach/\fIm\fP.c
518 is the identifier character assigned to the architecture),
523 data structures with values defining
524 the register set and various system parameters.
525 The source file for a similar architecture
526 can serve as template.
527 Most of the fields of the
529 data structure are obvious
530 but a few require further explanation.
532 .IP "\f(CWkbase\fP\ -\ "
534 contains the address of the kernel
537 assume the first entry of the kernel
541 structure for a kernel thread.
542 .IP "\f(CWktmask\fP\ -\ "
544 is a bit mask used to calculate the kernel text address from
548 The first page of the
549 kernel text segment is calculated by
551 the negation of this mask with
553 .IP "\f(CWkspoff\fP\ -\ "
555 contains the byte offset in the
557 data structure to the saved kernel
558 stack pointer for a suspended kernel thread. This
564 .IP "\f(CWkpcoff\fP\ -\ "
565 This field contains the byte offset into the
569 the program counter of a suspended kernel thread.
570 This is the offset to
574 .IP "\f(CWkspdelta\fP and \f(CWkpcdelta\fP\ -\ "
576 contain corrections to be added to
577 the stack pointer and program counter, respectively,
578 to properly locate the stack and next
579 instruction of a kernel thread. These
580 values bias the saved registers retrieved
588 Most architectures require no bias
589 and these fields contain zeros.
590 .IP "\f(CWscalloff\fP\ -\ "
592 contains the byte offset of the
596 data structure associated with a process.
599 field contains the number of the
600 last system call executed by the process.
601 The location of the field varies depending on
602 the size of the floating point register set
603 which precedes it in the
608 Add an entry to the initialization of the
610 data structure at the beginning of file
611 .CW /sys/src/libmach/executable.c .
613 require two entries: one for
614 a normal executable and
616 image. Each table entry contains:
620 The big-endian magic number assigned to the architecture in
621 .CW /sys/include/a.out.h .
624 A string describing the executable.
626 Executable type code\ \-\
627 The executable code assigned in
628 .CW /sys/include/mach.h .
630 \f(CWMach\fP pointer\ \-\
631 The address of the initialized
633 data structure constructed in Step 2.
634 You must also add the name of this table to the
637 table definitions immediately preceding the
642 The number of bytes in the executable file header.
643 The size of a normal executable header is always
645 The size of a bootable header is
646 determined by the size of the structure
647 for the architecture defined in
648 .CW /sys/include/bootexec.h .
650 Byte-swapping function\ \-\
655 for big-endian and little-endian
656 architectures, respectively.
659 The address of a function to decode the header.
662 decodes the common header shared by all normal
663 (i.e., non-bootable) executable files.
664 The header format of bootable
665 executable files is defined by the manufacturer and
666 a custom function is almost always
667 required to decode it.
669 .CW /sys/include/bootexec.h
670 contains data structures defining the bootable
671 headers for all architectures. If the new architecture
672 uses an existing format, the appropriate
673 decoding function should already be in
675 If the header format is unique, then
676 a new function must be added to this file.
677 Usually the decoding function for an existing
678 architecture can be adopted with minor modifications.
682 Write an object file parser and
684 .CW /sys/src/libmach/\fIm\fPobj.c
687 is the identifier character assigned to the architecture.
688 Two functions are required: a predicate to identify an
689 object file for the architecture and a function to extract
690 symbol references from the object code.
691 The object code format is obscure but
692 it is often possible to adopt the
693 code of an existing architecture
694 with minor modifications.
696 functions are in hand, insert their addresses
697 in the jump table at the beginning of file
698 .CW /sys/src/libmach/obj.c .
701 Implement the required debugger support functions and
702 initialize the parameters and jump table of the
704 data structure for the architecture.
705 This code is conventionally stored in
707 .CW /sys/src/libmach/\fIm\fPdb.c
710 is the identifier character assigned to the architecture.
715 .IP "\f(CWbpinst\fP and \f(CWbpsize\fP\ -\ "
717 contain the breakpoint instruction and the size
718 of the instruction, respectively.
719 .IP "\f(CWswab\fP\ -\ "
721 contains the address of a function to
722 byte-swap a 16-bit value. Choose
726 for little-endian or big-endian architectures, respectively.
727 .IP "\f(CWswal\fP\ -\ "
729 contains the address of a function to
730 byte-swap a 32-bit value. Choose
734 for little-endian or big-endian architectures, respectively.
735 .IP "\f(CWctrace\fP\ -\ "
737 contains the address of a function to perform a
738 C-language stack trace. Two general trace functions,
742 traverse fixed-frame and relative-frame stacks,
743 respectively. If the compiler for the
744 new architecture conforms to one of
745 these models, select the appropriate function. If the
746 stack model is unique,
747 supply a custom stack trace function.
748 .IP "\f(CWfindframe\fP\ -\ "
750 contains the address of a function to locate the stack
751 frame associated with a text address.
756 process fixed-frame and relative-frame stack
758 .IP "\f(CWufixup\fP\ -\ "
760 contains the address of a function to adjust
761 the base address of the register save area.
763 68020 requires this bias
764 to offset over the active
766 .IP "\f(CWexcep\fP\ -\ "
768 contains the address of a function to produce a
770 string describing the
772 Each architecture stores exception
773 information uniquely, so this code must always be supplied.
774 .IP "\f(CWbpfix\fP\ -\ "
776 contains the address of a function to adjust an
777 address prior to laying down a breakpoint.
778 .IP "\f(CWsftos\fP\ -\ "
780 contains the address of a function to convert a single
781 precision floating point value
787 for big-endian architectures.
788 .IP "\f(CWdftos\fP\ -\ "
790 contains the address of a function to convert a double
791 precision floating point value
797 for big-endian architectures.
798 .IP "\f(CWfoll\fP, \f(CWdas\fP, \f(CWhexinst\fP, and \f(CWinstsize\fP\ -\ "
799 These fields point to functions that interpret machine
801 They rely on disassembly of the instruction
802 and are unique to each architecture.
804 calculates the follow set of an instruction.
806 disassembles a machine instruction to assembly language.
808 formats a machine instruction as a text
812 calculates the size in bytes, of an instruction.
813 Once the disassembler is written, the other functions
814 can usually be implemented as trivial extensions of it.
816 It is possible to provide support for a new architecture
817 incrementally by filling the jump table entries
820 structure as code is written. In general, if
821 a jump table entry contains a zero, application
822 programs requiring that function will issue an
823 error message instead of attempting to
824 call the function. For example,
831 jump table slots can be zeroed until a
832 disassembler is written.
833 Other capabilities, such as
834 stack trace or variable inspection,
835 can be supplied and will be available to
836 the debuggers but attempts to use the
837 disassembler will result in an error message.
840 Update the table named
842 near the beginning of
843 .CW /sys/src/libmach/setmach.c .
845 file type code and machine name to the
849 structures of an architecture.
850 The names of the initialized
854 structures built in steps 2 and 5
855 must be added to the list of
856 structure definitions immediately
857 preceding the table initialization.
859 native disassembly are supported, add
860 an entry for each disassembler to the table. The
861 entry for the default disassembler (usually
862 Plan 9) must be first.
864 Add an entry describing the architecture to
868 .CW /sys/src/cmd/prof.c .
871 Add an entry describing the architecture to
875 .CW /sys/src/cmd/pcc.c .
878 Recompile and install
879 all application programs that include header file