================================================================================
B A S T A R D                                            disassembly environment


                  Bastard  Extension  Programming  Guide




================================================================================
 Contents


 1. The Bastard Extensions
 2. Loading Extensions
 3. Architecture (ARCH) Extensions
 4. Assembler (ASM) Extensions
 5. File Format (FORMAT) Extensions
 6. High-Level Language (LANG) Extensions
 7. Disassembly Engine (ENGINE) Extensions
 8. Plugin (PLUGIN) Extensions
 9. Extension.[ch]


================================================================================
 The Bastard Extensions

Writing a disassembler for a single platform is pretty straightforward; the 
machine word size will never change, executable files will have the same format,
and the assembly language can be represented consistently. When supporting
multiple CPUs, OSes, and assembly languages, however, things become a bit more
complicated.

Bastard Extensions provide a way to add drop-in modules for support of different
CPU architectures, executable file formats, assembly languages, and high-level
languages. These extensions can be either compiled binary shared libraries or
BC scripts, thus allowing extensions to be written and debugged in script form
before recompiling for distribution as shared libraries.

There are six different types of extensions:

   EXT_ARCH   : CPU Architecture Extensions
     These are located in $BASTARD_HOME/arch and contain all data and
     subroutines that is specific to a CPU type; these are the 
     extensions responsible for disassembling binary code.

   EXT_ASM    : Assembler Extensions
     These are located in $BASTARD_HOME/asm and contain all data and
     subroutines that are specific to an output assembler; the formatting
     of assembly language code for output to the screen or a file is 
     managed by these extensions.
     
   EXT_FORMAT : File Format Extensions
     These are located in $BASTARD_HOME/formats and contain all data,
     structures, and subroutines required to parse various file formats
     such as ELF, PE, and a.out. Note that all OS-specific processing of
     the target --e.g. startup state of register variables-- should take
     place here, as the specific OS for which a target was compiled is
     generally included in the file format.

   EXT_LANG   : High Level Language Extensions
     These are located in $BASTARD_HOME/lang and contain all data and
     subroutines that are required to deal with a targetted high level
     language such as C or FORTRAN. It should be noted that the initial
     use of these extensions is to determine characteristics of the 
     target which are caused by the high level language the target was
     written in, such as strings and data types; it is assumed that the
     same high-level language will be used to format and output the 
     final decompiled code of the target, but the user has the option of
     replacing the Language Extension before this output takes place.
     
   EXT_ENGINE : Post-Disassembly Engine Extensions
     Disassembly Engines were initially scripts which were given numeric
     names and which were executed in numeric order after the original
     disassembly had taken place; these are now true extensions located
     in the $BASTARD_HOME/engines directory. The names still reflect the 
     order of processing; in most respects, however, the engines resemble
     the Plugin extensions and could in theory be replaced by them.

   EXT_PLUGIN : Disassembler Plugin Extensions
     These are compiled binary extensions to the bastard, intended for
     arbitrary extensions to the bastard which must be shipped as binary
     code either for speed or copy protection purposes. The plugins shipped
     with the bastard are stored in .$BASTARD_HOME/plugins and include the
     disassembly schemes invoked by disasm_target().
   
Source code templates are provided in the extension directories under the src/
tree, e.g. src/arch/Template.c for the Architecture Extensions.


================================================================================
 Loading Extensions

Extensions are loaded indirectly by setting parameters for the target using the
Bastard API:
   
   #include <bastard.h>
   
   int target_set_arch( char *file_arch );
   int target_set_asm( char *asm_output );
   int target_set_format( char *file_format );
   int target_set_lang( char *language );
   int disasm_pass( int pass );
   int plugin_load( char *name );

Note that all of these, with the exeception of disasm_pass(), merely load
the extension without executing any of the functions. The same effect can be
obtained with

   #include <extension.h>

   int LoadExtension( int type, char * filename, void *param);

where type is one of

   EXT_PLUGIN      0x0001
   EXT_ENGINE      0x0002
   EXT_ARCH        0x0003
   EXT_ASM         0x0004 
   EXT_FORMAT      0x0005
   EXT_LANG        0x0006

and filename is the absolute or relative path of the extension, and param is a 
pointer to the extension-specific settings structure as dictated in extension.h.
In general it is better to use the API functions and thus avoid missing any
steps in the loading process; however the extension.h routines provide low-level
access to the extension subsystem if needed.

When plugins are loaded by the API routines, they are expected to be located in
the appropriate directory, and to be named lib$NAME.so if they are a shared
library, or $NAME.bc if they are a script. This means that an Architecture
Extension for MIPS would be either arch/libMIPS.so or arch/MIPS.bc depending
on its format, while a FORTRAN Language extension would be lang/libFORTRAN.so
or lang/FORTRAN.bc. Files which are not in their correct locations will not be
loaded.

Note that API routines for loading Extensions take as a parameter only the $NAME
portion of the filename; these routines look first for $NAME.BC in the extension
directory, then for lib$NAME.so. Thus, scripts will always override compiled
binary extensions; this is intended to facilitate the development or overriding
of extensions by modifying a .BC file without overwriting an existing shared
library.


Each extension type is represented by a structure which contains all of the 
variables and function pointers shared between the extension and the bastard;
these structures are defined in extension.h and will be discussed at the end
of this manual. All extensions must provide associated init() and cleanup()
routines; these routines are called when the extension is loaded and unloaded,
respectively.

When the API is used to load an extension, a pointer to this structure is
passed to the init() routine of the extension; thus, the Architecture extension
is passed a pointer to the EXT__ARCH structure which the bastard will use to
represent it. In this way, options can be passed to the extension by setting
variables in the structure prior to loading the extension:

void ext_arch_init( void *param) {
   struct EXT__ARCH *settings = (struct EXT__ARCH *)param;

   if (! settings) return;
   if ( settings->options & MODE_16_BIT ) {
      /* perform setup for 16-bit mode */
   } else {
      /* perform setup for 32-bit mode */
   }
}

Note that the extension structures are currently global objects in the file
extension.c; in the future these may be replaced with linked lists in order to
support loading of multiple Plugin or Engine extensions.


================================================================================
 Architecture (ARCH) Extensions

The reference implementation of the Architecture Extensions is the i386.c file
located in src/arch.

   CPU Options
   -----------
The following information is optional:

   options                    -- Options passed to the extension
   cpu_hi                     -- Hi version number of the CPU
   cpu_lo                     -- Low version number of the CPU


   Platform Settings
   -----------------
The bastard depends on certain platform settings; these need to be set by the
extension:

   endian                     -- endian: 0 (big) or 1 (little)
   sz_addr                    -- address size 
   sz_oper                    -- operand size (not used)
   sz_inst                    -- instruction size (not used)
   sz_byte                    -- size of byte, in bits (not used)
   sz_word                    -- size of machine word in bytes
   sz_dword                   -- size of dword in bytes
   SP                         -- index in regtable of stack pointer
   IP                         -- index in regtable of instruction pointer
   

   Register Tables
   ---------------
The Architecture Extension must create register tables or use by the bastard;
this is important since registers are represented in CODE objects as indexes
into this table.  The bastard API provides the AddRegTableEntry command:

   int AddRegTableEntry(int index, char *mnemonic, int size);

Note that the extension must allocate memory for the regtable before assigning
entries to it, e.g.:

   sz_regtable = REGTABLE_SIZE;
   reg_table = calloc( sizeof(struct REGTBL_ENTRY), REGTABLE_SIZE);
   AddRegTableEntry( REG_FLAGS_INDEX, "eflags", REG_FLAGS_SIZE);


To support the regtable, the extension therefore must provide the following
variables:

   reg_table                  -- pointer to allocated space for register table
   sz_regtable                -- number of entries in the register table
   reg_storage                -- pointer to allocated space for emu (not used) 
   

   Required Functions
   ------------------
Architecture extensions must provide the following routines:

   void ext_arch_init( void *param);
   void ext_arch_cleanup( void );
   int get_prologue(struct code **table);
   int get_epilogue(struct code **table);
   int get_reg_effect( char *mnemonic, struct code_effect *e);
   int gen_int( int func_id );
   int disasm_addr( BYTE *buf, BYTE tbl, struct code *c, long rva);

The get_prologue and get_epilogue routines allocate an array of CODE structures
which are used by some Engine Extensions to detect function prologues and
epilogues (i.e. entering and leaving stack frames) in the target. The array is
a collection of sequences of code instructions separated by zero-filled CODE
structures:

   table = {
     { /* first instruction in pattern 1 */ },
     { /* second instruction in pattern 1 */ },
     { 0 },
     { /* first instruction in pattern 2 */ },
     { 0 },
     { /* first instruction in pattern 3 */ },
     { /* second instruction in pattern 3 */ },
     { /* third instruction in pattern 3 */ },
     { 0 }
   }

The number of patterns in the table is returned by the routine.

It is best to illustrate this with an example. The intel architecture has two
patterns which represent function prologues:

   /* pattern 1 */
   push ebp
   mov ebp, esp
   sub esp, #
   /* pattern 2 */
   enter

This the table created by the Intel get_prologue() routine would look like this:

   table = {
     { /* CODE struct for 'push ebp' */ },
     { /* CODE struct for 'mov ebp, esp' */ },
     { /* CODE struct for 'sub esp,' */ },
     { 0 },
     { /* CODE struct for 'enter' */ },
     { 0 }
   }

or, as the code is actually written:

   t = (struct code *) calloc( sizeof( struct code ), 6);
   /* prolog1: push ebp; mov ebp,esp; sub esp */
   strcpy( t[0].mnemonic, "push");
   t[0].dest = 5 + REG_DWORD_OFFSET;
   strcpy( t[1].mnemonic, "mov");
   t[1].dest = 5 + REG_DWORD_OFFSET;
   t[1].src= 4 + REG_DWORD_OFFSET;
   strcpy( t[2].mnemonic, "sub");
   t[2].dest = 4 + REG_DWORD_OFFSET;
   /* prolog2: enter */
   strcpy( t[4].mnemonic, "enter");
   
An Engine extension (by default, engines/pass1.bc) uses code such as the 
following to test if a given address is the start of a function prologue:

   int test_pattern(struct code *input, struct code *pattern){
      struct code c, *d;
      int cont = 1, x = 0;

      d = &pattern[x];
      db_index_find(CODE_RVA, &input->rva, &c);
      while ( d->mnemonic[0] && cont ){
         if ( strcmp(d->mnemonic, c.mnemonic))        return(0);
         if ( d->dest != c.dest )                     return(0);
         if ( d->src  != c.src )                      return(0);
         if ( d->aux  != c.aux )                      return(0);
         if (! db_index_next(CODE_RVA, &c))             cont = 0;
         d = &pattern[++x];
      }
      return(1);
   }

This iterates through each CODE structure in the pattern, comparing them with
successive addresses in memory as long as the addresses continue to match
the patterns.


The get_reg_effect routine fills the code_effect structure with the effects of
the specified instruction; if the instruction causes eax to be overwritten, for
example, then the code_effect will be generated for a modification of eax with
a change of 0 (unknown). The code effect returned by this routine is only 
related to the instruction itself and is ignorant of specific operands that are
modified; therefore the inherent limitation of only one code effect per mnemonic
is adequate.

Intermediate code for the specified function is generated by the gen_int 
routine; this code will be used in subsequent decompilation phases, and will
take the form of assembly language for an idealized RISC processor.

The bulk of the work for the Architecture extension is done by the disasm_addr
routine; this disassembles the binary code at the memory location pointed to
by 'buf', using its internal opcode table 'tbl' (this is set to 0 when called
by the bastard), with the rva of the current instruction in 'rva'. The 
instruction which is generated is stored in the code struct 'c', and the size
of the disassembled instruction is returned. Note that this routine is 
responsible for generating address expressions for appropriate operands.


================================================================================
 Assembler (ASM) Extensions

The reference implementation of the Assembler Extensions is the intel.c file
located in src/asm.
   
   Asmsprintf Format Strings
   -------------------------
Different format strings can be supplied to allow custom output to the screen 
or file; these may all be set to the same value, but should be customized for
code and data, as well as for .asm files.

   asm_ttyColor                        -- code format string for ANSI Color ttys
   asm_ttyMono                         -- code format string for mono ttys
   asm_file                            -- code format string for .asm files
   asm_lpr                             -- code format string for printers
   data_ttyColor                       -- data format string for ANSI Color ttys
   data_ttyMono                        -- data format string for mono ttys
   data_file                           -- data format string for .asm files
   data_lpr                            -- data format string for printers

The format syntax is specified in src/api/api_address.c in the declaration for 
the asmsprintf() routine; the basic formatting characters are:

        %n - name
        %S - section
        %a - rva               
        %p - pa
        %b - raw bytes         
        %m - mnemonic          
        %d - dest operand      
        %s - source operand    
        %t - third operand :)
        %c - comment
        %x - xrefs
    
        %, - conditional comma (display a comma between two %fields)
        %^ - conditional newline (display a newline between two % fields)
        %; - conditional semicolon (display a semicolon if text follows)
        %: - conditional colon (display a colon between two % fields)

Usage is the same as printf format strings; a number may be inserted between
the % and the format char in order to limit the max number of characters 
displayed in the case of a comment or mnemonic, or to specify the max number of
raw bytes or the max number of xrefs to display. The default format strings for
the intel assembler are:

   code_format = "%n%:%^%a %8b\t%m\t%d%, %s %c %;%x";
   data_format = "%n%:%^%a %8b\t%c %;%x";

The code string prints the address name followed by a colon and a newline if
a name exists; the RVA of the address is then printed, followed by up to 8
bytes of code from the address; after this comes a tab followed by the usual
mnemonic-dest-src representation, then a comment, and a semicolon preceding any
xrefs. The data string prints the name and address followed by up to 8 bytes 
form the address, then any comments and xrefs.


   Prefix Strings
   --------------
These variables are used by asmsprintf() for formatting output; they should be
set to the empty string "" if the variable isn't used.

   comment                             -- Comment string (";" in intel)
   reg_pre                             -- Register prefix ("%" in AT&T)
   imm_pre                             -- Immediate prefix ("$" in AT&T)
   local_pre                           -- Local label prefix ("@@" in intel)

   Required Functions
   ------------------
All Assembler Extensions are expected to provide the following routines:

   void ext_asm_init( void *param );
   void ext_asm_cleanup(void);
   int sprint_code( long rva, char *line, int len, int output);
   int sprint_addrexp(char *str, int len, char *scale, char *index, char *base,
                      char *disp, int sign);
   int sprint_asm_func_start( char *str, int len, int func_id);
   int sprint_asm_func_end( char *str, int len, int func_id);
   int sprint_asm_struct( char *str, int len, long rva );
   int sprint_section_start( char *str, int len, char *name );
   int sprint_section_end( char *str, int len, char *name );

The sprint_asm_func_start and sprint_asm_func_end routines print the header and
footer (for example, "proc $NAME" and "endproc") of a function to the buffer;
sprint_section_start and sprint_section_end do the same for section headers and
footers (e.g. ".DATA" ). The sprint_asm_struct routine formats and prints an
entire structure to the specified buffer.

Address expressions are printed by the sprint_addrexp routine; since this
representation is assembler-dependent, it is managed here instead of in the
asmsprintf() routine. The routine is pretty straightforward, as the Intel
example will show:

   int sprint_addrexp(char *str, int len, char *scale, char *index, char *base,
                                 char *disp, int sign){
      char sd = '+', idx[16] = {0};
      char tmp[32];

      if (sign & 0x0001) sd = '-';

      if (scale[0] && index[0])
         snprintf(idx, 16, "(%s * [%s])", scale, index);
      else if (index[0])
         snprintf(idx, 16, "[%s]", index);

      if (base[0]) {
         snprintf(str, len, "[%s]", base);
         if (idx[0]) {
            strncat(str, " + ", len);
            strncat(str, idx, len);
         }
         if (disp[0]){
            snprintf(tmp, 32, " %c %s", sd, disp);
            strncat(str, tmp, len);
         }
      } else if (idx[0]) {
         snprintf(str, len, "%s %c %s", idx, sd, disp);
      } else {
         snprintf(str, len, "%c %s", sd, disp);
      }

      return(strlen(str));
   }

The only other routine needing explanation is sprint_code; this is called by the
addr_print API call, and is responsible for invoking asmsprintf() to format
code and data. The sprint_code routine may also take responsibility for 
printing any strings, structures, and function or section headers:

   int sprint_code( long rva, char *line, int len, int output){
      switch (output) {
         case ASM_OUTPUT_PRINTER:
         case ASM_OUTPUT_FILE:
         case ASM_OUTPUT_TTY_COLOR:
         case ASM_OUTPUT_TTY:
         default:
            asmfmt = settings->asm_ttyMono;
            datafmt = settings->data_ttyMono;
            break;
      }

      addr = GetAddressObject(rva);

      if (addr->flags & ADDR_CODE ) {
         if (addr->flags & ADDR_FUNCTION){
            /* print function header */
         } else if (addr->flags & ADDR_INLINE){
            /* Inline Function */
         }
         asmsprintf(tmp, asmfmt, addr); 
      } else if (addr->flags & ADDR_STRUCT){
         /* print structure */
      } else if ( addr->flags & ADDR_STRING ) {
         /* print string */
      } else {
         asmsprintf(line, datafmt, addr);
      }
      return(strlen(line));
   }
   
Note that all output is optional, and that asmsprintf need not even be called 
if the Assembler Extension wants absolute control over the format.


================================================================================
 File Format (FORMAT) Extensions

The reference implementation of the Format Extensions is the ELF.bc file located
in formats/.

   Required Functions
   ------------------
Format extensions are expected to provide the following routines:

   void ext_format_init(void *param );
   void ext_format_cleanup(void);
   int read_header(void);

The read_header routine is essentially a main() for the Format extension; it
can acquire the target structure using the env_get_target() API routine, and from
there read from either the file descriptor or the memory image of the target:

   int read_header(void) {
      struct ELF_HDR elf_hdr     = malloc(sizeof(struct ELF_HDR));
      struct DISASM_TGT target = env_get_target();

      lseek(target->FD, 0, 0);
      read(target->FD, elf_hdr, sizeof(struct ELF_HDR));
   }

The read_header routine is expected to create program sections, as well as to
set the target entrypoint (target->info.entry) and to create any symbols found 
in the file header. Since the read_header routine is called only once, and is 
the only notable routine supplied, the Format extension behaves more like the 
Plugin or Engine extensions than the Assembler or Architecture extensions.


================================================================================
 High-Level Language (LANG) Extensions

The reference implementation of the Language Extensions is the C.c file located 
in src/lang.

   Data Types
   ----------
The LANG extension is responsible for setting up the initial data types which
will be used in association with addresses, functions, and structure definitions
in the target. A data type is a symbolic name and a size which can be associated
with a data object during decompilation. The standard C data types can be set
up as follows:

         dtype_new( "int", 4, DT_SIGNED);
         dtype_new( "unsigned int", 4, DT_UNSIGNED);
         dtype_new( "short", 2, DT_SIGNED);
         dtype_new( "unsigned short", 2, DT_UNSIGNED);
         dtype_new( "char", 1, DT_SIGNED);
         dtype_new( "unsigned char", 1, DT_UNSIGNED);
         dtype_new( "long", 4, DT_SIGNED);
         dtype_new( "unsigned long", 4, DT_UNSIGNED);
         dtype_new( "long long", 8, DT_SIGNED);
         dtype_new( "unsigned long long", 8, DT_UNSIGNED);
         dtype_new( "float", 4, DT_UNSIGNED);
         dtype_new( "double", 8, DT_UNSIGNED);
         dtype_new( "long double", 12, DT_UNSIGNED);

Further data types can be added by the user, by the automatic processing of
typedef statements in HLL header files, or by plugins.


   Default Data Types
   ------------------
When creating data address objects, the disassembler will assign to each data
address a data type corresponding to the default data type for the address size
and flags. The LANG extension must provide a set of basic data types; these are
simply the ID of the data type being used for that data type size and sign:

   u1 -- Default data type for unsigned 1 byte variables
   s1 -- Default data type for signed 1 byte variables
   u2 -- Default data type for unsigned 2 byte variables
   s2 -- Default data type for signed 2 byte variables
   u4 -- Default data type for unsigned 4 byte variables
   s4 -- Default data type for signed 4 byte variables
   u8 -- Default data type for unsigned 8 byte variables
   s8 -- Default data type for signed 8 byte variables

So the disassembler would assign the DATA_TYPE ID from u4 to a data address of
size 4 bytes with the UNSIGNED flag. This will allow functional data types to
be assigned to all regular variables when transitioning from asm to high-level
code.


   Required Functions
   ------------------
All Language Extensions are expected to provide the following functions:

   void ext_lang_init( void *param );
   void ext_lang_cleanup( void );
   int add_data_types(int mach_word);
   int makestr(char *buf, char *text, int text_len, int type);
   int findstr(char *buf, int len, int *pos, char *text, int text_len,int type);
   int sprint_proto(char *buf, int len, int func_id);
   int sprint_func(char *buf, int len, int func_id);
   int gen_final(int func_id);
   int gen_file( char *path);

The add_data_types function sets the initial entries in the DATA_TYPE table;
makestr generates a C-style string from the contents of the buf parameter while
find_str searches up to len bytes of buf for the next recognized string. The
sprint_proto and sprint_func routines format and output a function prototype 
and a function declaration (i.e., the actual code) using the contents of the
FIN_CODE table. The gen_final routine generates FIN_CODE entries for the given
function, and gen_file produces a high-level language source code file for the
target, using the FIN_CODE table.


================================================================================
 Disassembly Engine (ENGINE) Extensions


Disassembly engine extensions contain passes over the target which are made
subsequent to the actual disassembly. All engines are named pass##.bc or 
libpass##.so in the engines directory; the entire filename except for the 
numeric portion is stripped and the engines are called in the resulting numeric
order, e.g. pass1.bc, pass2.bc, libpass3.so, pass4.bc, and so on.

The engines are intended to provide all common disassembly operations which do 
not entail actual machine instruction decoding; this includes string and 
subroutine recognition, applying standard programming headers to identify 
constants and data structures, and applying library signatures.

   Required Functions
   ------------------
Engines should have the following functions:

  void ext_engine_init( void *param ) ;
  void ext_engine_cleanup( void );
  int engine_main( void *param );

The init() and cleanup() functions will only be executed in the case of a 
shared library extension; even then, they are not needed.


================================================================================
 Plugin (PLUGIN) Extensions


Plugins provide a way for users and developers to produce compiled binary 
extensions to the bastard. These are not plugins in the true sense; rather, they
are more like compiled BC scripts, where each plugin has an init(), main() and a
cleanup() routine. Only one plugin can be loaded at a time and are not intended
to be interactive; plugins requiring interaction with the user must wait for
a Bastard front-end with the ability to provide its own plugin architecture.

   Required Functions
   ------------------
Plugins should provide the following routines:

   void ext_plugin_init(void *param);
   void ext_plugin_cleanup(void);
   int plugin_main( void *param);

The ext_plugin_init() and ext_plugin_cleanup() are optional, and are not called
if the plugin is a BC script (as opposed to a shared library). The plugin_main
routine takes a single parameter which can be used to pass options to the 
plugin.

   Using Plugins
   -------------
Plugins are managed with the following API routines:

   int plugin_load( char *name );
   int plugin_exec( void *param );
   int plugin_unload( );

The first loads the plugin specified by name; the second runs plugin_main() in
the currently loaded plugin with param passed in as a void pointer; the final
routine unloads the currently loaded plugin and calls its cleanup routine.

   Disassembly Extensions
   ----------------------
In order to demonstrate the plugin architecture and to allow more fine-grained
control over the disassembly process for the user, the disassembly methods
employed by the bastard are provided as plugins. These follow the naming
convention plugins/libdisasm.$METHOD.so or plugins/disasm.$METHOD.bc, where
$METHOD is a keyword used to invoke the plugin using disasm_target; for
example, disasm_target("full") will invoke plugins/libdisasm.full.so, while
disasm_target("dumb") will invoke plugins/libdisasm.dumb.so.

The code for these plugins is located in src/plugins, and users are encouraged
to examine these and develop their own custom disassembly methods.


================================================================================
 Extension.[ch]

The files src/extension.c and include/extension.h define the Extensions 
interface. All extensions are based on a common structure:

   struct EXTENSION {
      char *filename;            /* name of extension file [full path] */
      int flags;           
      void *lib;                 /* pointer to library */
      ext_init_fn fn_init;       /* init function for extension */
      ext_clean_fn fn_cleanup;   /* cleanup function for extension */
   };

Each extension type has its own associated structure, which is passed to the
extension's init() routine on loading:

   struct EXT__ARCH {   
      struct EXTENSION ext;
      int options;             // module-specific options
      /* ------------------  CPU Information  -------------------- */
      int cpu_hi, cpu_lo;      // CPU high and low version numbers
      char endian;             // 0 = BIG, 1 = LITTLE
      char sz_addr;            // Default Size of Address in Bytes
      char sz_oper;            // Default Size of Operand in Bytes
      char sz_inst;            // Default Size of Instruction in Bytes
      char sz_byte;            // Size of Machine Byte in Bits
      char sz_word;            // Size of Machine Word in Bytes
      char sz_dword;           // Size of Machine DoubleWord in Bytes
      int SP;                  // RegID of Stack Pointer
      int IP;                  // RegID of Instruction Pointer
      /* ------------------ Register Tables ---------------------- */
      struct REGTBL_ENTRY *reg_table;
      int sz_regtable;
      unsigned char *reg_storage;
      /* ------------------ Library Functions -------------------- */
      disfunc_fn   fn_disasm_addr;    
      getcode_fn   fn_get_prologue;     
      getcode_fn   fn_get_epilogue;     
      geneffect_fn fn_gen_effect;
      genint_fn    fn_gen_int;
   } *ext_arch;


   struct EXT__ASM {             
      struct EXTENSION ext;
      /* ---------------------- Prefixes ------------------------- */
      char *comment;          /* comment character/string */
      char *reg_pre;          /* register prefix string */
      char *imm_pre;          /* immediate value prefix string */
      char *local_pre;        /* local label value prefix string */
      /* -------------------- Format Strings ---------------------- */
      char *asm_ttyColor;         /* assume 80 col, use coor escape seq's */
      char *asm_ttyMono;          /* assume 80 col, no escape seq's */
      char *asm_file;             /* no line length */
      char *asm_lpr;              /* no line length */
      char *data_ttyColor;        /* assume 80 col, use coor escape seq's */
      char *data_ttyMono;         /* assume 80 col, no escape seq's */
      char *data_file;            /* no line length */
      char *data_lpr;             /* no line length */
      /* ------------------ Library Functions -------------------- */
      sprintcode_fn      fn_sprint_code;     /* fix reg, addr, mnems */
      sprintaddrexp_fn   fn_sprint_addrexp;  /* sprint the address expression */
      sprintfunc_fn      fn_sprint_func_s;
      sprintfunc_fn      fn_sprint_func_e;
      sprintstruct_fn    fn_sprint_struct;
      sprintsec_fn       fn_sprint_sec_s;     
      sprintsec_fn       fn_sprint_sec_e;      
      genasmfile_fn      fn_gen_file;
   } *ext_asm;


   struct EXT__FORMAT {
      struct EXTENSION ext;
      /* ------------------ Library Functions -------------------- */
      readhdr_fn fn_read_header;
   } *ext_format;


   struct EXT__LANG {            
      struct EXTENSION ext;
      /* ------------------ Default DataTypes -------------------- */
      int u1, s1;          /* unsigned/signed 1-byte */
      int u2, s2;          /* unsigned/signed 2-byte */
      int u4, s4;          /* unsigned/signed 4-byte */
      int u8, s8;          /* unsigned/signed 8-byte */
      /* ------------------ Library Functions -------------------- */
      void *fn_parsehdr;   /* process header for constants, fn protos, etc */
      findstr_fn fn_findstr;
      datatype_fn fn_datatype;
      makestr_fn fn_makestr;
      sprint_fn fn_sprint_proto;
      sprint_fn fn_sprint_func;
      genfinal_fn fn_gen_final;
      genhllfile_fn fn_gen_file;
   } *ext_lang;

   
   struct EXT__ENGINE {
      struct EXTENSION ext;
      int options;
      /* ------------------ Library Functions -------------------- */
      engine_fn fn_main;
   } *ext_engine;


   struct EXT__PLUGIN {
      struct EXTENSION ext;
      int options;
      /* ------------------ Library Functions -------------------- */
      plugin_fn fn_main;
   } *ext_plugin;


The extension.c file provides stubs for each Extension function pointer, which
enable the functions to be accessed via EiC when they are in a BC file and not
in a shared library; also, wrappers for all of the Extension functions are 
provided which protect the caller against NULL function pointers (as will happen
if a shared library extension does not provide all of the expected functions).

These wrapper and stub routines are listed at the end of the extension.h file.
