Package="Symbol Table" Author="Mark Whitis" Version="Prerelease" Souce_code=True Cost=0Easily handle command line arguments like:
Submit Package="Symbol Table" Author="Mark Whitis"Essentially, this allows you to make selected variables and structures so "global" that they are exported to users and other programs via the command line, configuration and data files, and remote network connections.
The symbol table package has been used in the past to implement:
Scalars:
<Field1>Value1</Field1> <Field2>Value2</Field2> <Field2>Value2</Field2>
xxxprintf("one=%($one) two=%($two)\n");
Some types which might get created in the future:
Some features are not currently implemented:
New in version 0.56
Specific versions (not necessarily the latest):
This software is copyrighted and distributed under these license terms. For the purpose of the license, "primary author(s)" and "maintainer(s)" are defined to be Mark Whitis. This license is more or less open source.
#include "symbol.h"
#include "basetype.h"
main()
{
st_pkg_init();
basetype_pkg_init();
...
...
basetype_pkg_term();
st_pkg_term();
}
Currently, I just link with symbol.o and basetype.o; I should make a
shared library soon.
Defaults. Most of the data structures (other than simple base types), have instances of those structures with default values. It is intended that you always initialize a data structure by copying the default values to it before changing any fields. That way, if another field is added later, your code won't be creating structures with uninitialized fields.
Data types are defined in basetype.h. Add "_st" to the end of a type name to reference its symbol table; in other words, use integer32s_t in a variable C declaration ("integer32s_t foo;") but use integer32s_t_st in symbol table definition or call to a symbol table function.
#include "symbol.h"
#include "basetype.h"
st_t user_vars_st = {
ST_BEGIN_TABLE(),
ST_BEGIN(),
ST_IDENTIFIER("st_debug"),
ST_DESCRIPTION("Debugging level for symbol table package"),
ST_AT( &st_debug ),
ST_TYPE(integer32s_t_st),
ST_END(),
ST_BEGIN(),
ST_IDENTIFIER("listen_port"),
ST_DESCRIPTION("Port number to listen on"),
ST_AT( &listen_port ),
ST_TYPE( integer32s_t_st ),
ST_END(),
ST_BEGIN(),
ST_IDENTIFIER("remote_port"),
ST_DESCRIPTION("Port number of remote process to contact"),
ST_AT( &listen_port ),
ST_TYPE( integer32s_t_st ),
ST_END(),
ST_BEGIN(),
ST_IDENTIFIER("remote_host"),
ST_DESCRIPTION("IP address or hostname of remote process"),
ST_AT( &listen_port ),
ST_TYPE( asciiz_st ),
ST_END(),
n
ST_TABLE_END(),
Both simple types and instances thereof (variables) are defined between ST_BEGIN() and ST_END(). Use ST_OFFSET() in type declarations (if a structure member) and ST_AT() for instances. Instances should also include ST_TYPE(), pointing to the symbol table defining the type. ST_IDENTIFIER() must be included. ST_DESCRIPTION() is optional.
Do not try weird uses of ST_INCLUDE() which do not fit nicely on begin/end boundaries.
Each separate table should have a ST_BEGIN_TABLE() and ST_END_TABLE(). If you omit these, strange things can happen. For example, you may find that all objects after the first are ignored, the program may fall off the end of a table into garbage, or the library may abort because the first member of a table is not a valid starting type.
Some subtypes of tables are valid. In general, if you pass a pointer to some type of begin attribute (ST_BEGIN_TABLE(), ST_BEGIN(), ST_BEGIN_STRUCT_REF(), etc.) the library will read that table only up to the matching end. This is used extensively within the library. Once a variable has been located within a table, the routines pass around a pointer to the ST_BEGIN() which begins that variables definition within a larger table to refer to that object.
A structure reference looks like:
ST_BEGIN_STRUCT(),
// or ST_BEGIN_UNION(),
ST_IDENTIFIER("foo"),
// If "", struct or union will be anonymous, i.e. you can refer
// to the members without naming the parent struct or union
// overrides any identifier specified in the symbol table pointed to by
// ST_TYPE() so you
// can specify a different name.
ST_OFFSET(),
// If a nested struct, this is the offset within the parent.
// All offsets should be added together (max on offset per level).
// It is best not to omit the offset if it is 0 or you might
// accidently inherit an offset from a member.
ST_AT(),
// omit if this is a type declaration
// specifies the location of the object
// there should not be another ST_AT() in the included object.
ST_SIZEOF(),
// size of the structure */
ST_DESCRIPTION(),
// short text description
ST_TYPE(),
// specifies the name of another symbol table defining the type
// structure itself. At some point, you might be able to
// use ST_INCLUDE() or simply list the members inline (between
// ST_BEGIN()/ST_END() pairs) for a struct.
ST_END_STRUCT(),
The actual structure will be defined in a separate symbol table, between ST_BEGIN_TABLE() and ST_END_TABLE() with each member enclosed in ST_BEGIN()/ST_END() pairs.
Anonymous structures make the members of the structure appear in the parent namespace (as if they had not been in a separate structure).
Nested structures have not been tested. It is possible the offsets might not be computed properly. If they work, you will need to define the inner structure in a separate table and include a reference (dont forget the offset attribute!).
The use of unions is not recommended, at the moment. The interpretation of unions in C is a bit vague, anyway, without involving data streams which might be shared by different architectures and different program types.
Implementation of unions is not complete. Unions are currently implemented like structs. Unions are degenerate structs anyway, with offsets of 0. There is a problem with printing unions (or anything that contains them), however. It will print each member of the union instead of only one and some of them might not be valid; indeed, one of the problems with C type unions is that there is no intrinsic way to determine the type of the value stored. At some point, this might change to only print the first type in a union or to have a tag to specify the default type for printing. A good default type for printing/saving might be a binary one which includes all bits, although that could be interpreted differently on different architectures or even different compilations of the same program. Unions should probably only be traversed for debugging dumps.
#include
typedef struct {
st_t *table;
void *object;
} smart_pointer_t;
extern void st_smart_pointer_t st_smart_pointer(st_t *table, void *object);
This is the constructor function for smart pointers. It takes two arguments.
The first is a pointer to the symbol table and the second is a pointer
to the object it describes.
#includeextern void st_show( FILE *stream, smart_pointer_t sp, st_show_opt_t *options ); extern st_show_opt_t st_show_defaults; This function will print out all of the variables (or structure members) pointed to by the smart pointer "sp". Output will be to the stream "stream" which may be "stdout", "stderr", an open file, a pipe, a serial port a TCP/IP network connection, or just about any valid unix stream; "stdout" and "stderr" will typically print to the users tty if they haven't been redirected. If you want to read a file, it is up to you to open the file before calling this function.Each line output will be preceeded by the contents of the string member "options.prefix". This will typically by "" (null string) but can also be a number of space (i.e. " ") for indentation or a value like "errror." to augment the name. Note that at some point in the future I will probably define two separate prefixes, one for lines and the other for variable names.
Output will be one line per variable or structure member and will look something like this (using the error struct as an example).
error.number=10 error.text="The sky is falling" error.field="chicken.little"st_read_stream()
#includeThis function is the opposite of st_show(). It reads a number of values from a stream, in the same "name=value" (one per line) format output by st_show().extern int st_read_stream( FILE *stream, smart_pointer_t sp, st_read_stream_opt_t options ); extern st_read_stream_opt_t st_read_stream_defaults; It is pretty lax about quotation marks; double quotes may be used or in many cases omitted. Lines beginning with "#" or "/" will be treated as comments and ignored and blank lines should also be ignored. Leading whitespace will be ignored at the beginning of each line.
Input will continue until an end of file condition is sensed or a line consisting of one of the termination strings defined in the options is read. The default termination strings are "END" and "DATA". You can also define another stream, options.echo; this will cause all data read to be copied to that stream. This is helpful for creating a log of all transactions, for example.
st_parse_args()
#includeThis function is used to parse command line arguments. The smart pointer will refer to a structure or collection of variables. The parameters argc and argv are usually just copies of the parameters to the function main() in your program.extern void st_parse_args(smart_pointer_t table, int argc, char **argv); Command line parameters are expect in simple name=value form with no leading switch chacters:
changeuser operation=view username=rootst_set()
st_walk()
st_walk_next()
st_lookup()
st_find_attrib()
st_tostring()
st_from_string()
st_dump()
Simplifying macros
Here are some macros which can simplify defining symbol tables for data structures, but are specific to use in a particular context. These do not include descriptions and will only work for struct members but their primary advantage is that they reduce the code needed to one line per member instead of a half dozen or so lines per member) at a cost of some flexibility. This is just intended as an example you can tailor to your own needs; review the appropriate sections of K&R Second Edition ("The C Programming Language" by Kernigan and Richie) or similar source for details of the new preprocessor features in ANSI C if you aren't familiar with them and want to write fancy macros.Note that the AIX C compiler is broken and cannot handle the nested macro references properly so you will need to use gcc instead.
#define XXSTRING(context,name) \ ST_BEGIN(),\ ST_IDENTIFIER(#name),\ ST_OFFSET( OFFSET(##context##,##name##)),\ ST_SIZEOF( SIZEOF(##context##,##name##)),\ ST_TYPE(asciiz_t_st),\ ST_END() #define XXINT(context,name) \ ST_BEGIN(),\ ST_IDENTIFIER(#name),\ ST_OFFSET( OFFSET(##context##,##name##)),\ ST_SIZEOF( SIZEOF(##context##,##name##)),\ ST_TYPE(integer32s_t_st),\ ST_END() #define XXBOOLEAN(context,name) \ ST_BEGIN(),\ ST_IDENTIFIER(#name),\ ST_OFFSET( OFFSET(##context##,##name##)),\ ST_SIZEOF( SIZEOF(##context##,##name##)),\ ST_TYPE(cboolean_t_st),\ ST_END()And here is a sample in use to define a structure type, yournamehere_t, with int, string, and boolean members named dummy1, dummy2, and dummy3, respectively. Note that we need to pass the name of the current structure to these macros so that it can compute the offset of these members within the structure.typedef struct { int dummy1; string dummy2; boolean dummy3; } yournamehere_t; st_t yournamehere_t_st[] = { ST_BEGIN_TABLE(), XXINT(yournamehere_t,dummy1), XXSTRING(yournamehere_t,dummy2), XXBOOLEAN(yournamehere_t,dummy3), ST_END_TABLE() };Using the symbol table to implement a Remote Proceedure Call
Code to implement a remote proceedure call is not included in the symbol table package itself but it is very easy to implement once you have the functions to establish the connection via TCP, SSL over TCP, SSH, or a direct serial link. The necessary code is mostly application specific glue with a two calls each to "st_show" and "st_readstream" and, of course, the symbol tables for the request/response header and the data structures actually being passed. Using an ASCII protocol based on name=value pairs has some distinct advantages. The protocol is very easy to debug; you can evesdrop on the traffic which is already in human readable form and you can manually generate queries (or store queries in text files) for testing. You can add fields to the data structures without having to update both client and server simultaneously; different versions of client and server can interoperate gracefully - default values will be assumed for fields which are not supplied and unknown fields can be either handled by accepting or rejecting a transaction. You can easily record all traffic to serve as a logfile which is not only human readable but machine parsable as well. And you can, if you want, implement the system to only transmit fields whose values deviate from the defaults to reduce transmission overhead. The protocol data can be compressed, if desired, using standard tools like "gzip" to sizes which are comparable to binary data structures if you need to conserve bandwidth; in some cases, where there are lots of sparsely populated text fields, a transmitted record may already be smaller than the original binary structure. Byte ordering problems between disimilar machines are not a problem. The RPC protocol loosely described here is sophisticated enough to handle temporary or persistent connections, asynchronous requests (sequence numbers allow the responses to various queries be returned in a different order than they were sent), and bidirection traffic over the same connection (the stream can contain a mixture of requests and responses in both directions).Queries and responses both look like: BEGIN (or "REQUEST" or "RESPONSE"). name=value name=value .. DATA name=value name=value .. ENDThe first structure sent (between BEGIN and DATA) has protocol related info. For a request, this is the type of request (i.e. the proceedure name), the sequence_number. In certain types of databse applications it might have some fields to limit the number of records in the response and to fetch continuation data if the previous response was truncated. The response structure has the function is being called, the sequence number, the error number, the error text, and the error field name. The second structure (between DATA and END) is the data structure being returned. If more than one rec ord is being passed, there will be multiple DATA sections before "END".If you are concerned about the overhead in parsing, consider that the protocol (and symbol table library) can be extended later to negotiate a tokenized or straight binary version with fallback to easily debugged text if that negotiation fails.
Client code snippet
st_show_opt_t options; options=st_show_defaults; establish_connection(&socket_in,&socket_out,...) strcpy(options.prefix," "); /* Indentation sprintf(header->sequence,"XYZ%08d", sequence_number++); fprintf(socket_out, "REQUEST\n"); st_show(socket_out, st_smart_pointer(protocol_request_t_st,header), &options); fprintf(socket_out,"DATA\n"); st_show(socket_out, data_sp, &options); fprintf(socket_out, "END\n");Now process the response in a manner similar to the way the request is processed on the receiving side.On the receiving side, you:
- Accept incoming connection (if not persistent)
- read "REQUEST"
- use st_read_stream() to read the header data.
- st_read_stream will also slurp up the "DATA", which it will interpret as the terminator since "END" and "DATA" are defined in st_read_stream_defaults as terminating strings. The return code from st_read_stream() actually tells you which terminator was received (EOF, "END", "DATA", etc.)
- based on the header data, decide which data structure to expect in the data stream and call "st_read_stream()" with a pointer to the appropriate symbol table and a pointer buffer of sufficient size to hold the corresponding raw structure.
- st_read_stream() will eat the terminating "END" or "DATA" automatically. The return code tells you which so you know if there is another data record (in addition to the count in the header).
- now process the request and send a response in much same manner as the request was sent.
If you need to use shared memory, semaphores, or message queues between multiple linux boxes, check out DIPC; note that DIPC intrinsicly requires kernel support. This has nothing to do with the symbol table library but is related to the RPC info being described here.
userchange sample program
A sample program, called userchange, is availible. This program provides the ability to view, create, delete, and modify /etc/passwd file entries on the local host or a remote host. For remote operation, the program invokes a slave copy of itself on another host using ssh, rsh, or a similar mechanism. Download userchange.tar.gz.Design constraints
Here are some of the constraints which dictated the particular form of this implementation.
- I needed to be able to compile symbol tables into a program. The C language has some serious limitations on initiallizers. There is no way, for example, to create a list of mixed size types which are stored in different size containers and have the compiler initialize them. Much more efficient use of space and flexibility would be possible by deferring symbol table initialization to runtime but this would substantially increase the size of the code segment.
- Access to sizeof, offsets, and addresses of objects was needed. Some of this is only readily availible with the compiler's (and linker's) cooperation.
- I could have divined the packing rules for various platforms (even automagically) and calculated structure member offsets based on that. In C++, virtual methods can cause some surprise offsets (they have to store a pointer somewher). And addresses can still be a problem.
- Adequate documentation for symbol tables embedded in object files was lacking and these are not always present (stripped executables).
- I was under severe time pressure, so I just implemented features I needed immediately.
- I tried to keep to strict ANSI C for portability. Even that is not low enough for some broken compilers; fortunately, gcc can generally replace such broken compilers.
- If you have type a with member b, you cannot say sizeof(a.b) or compute the offset of member b within structure a without actually creating an instance of that type.
- There are various other places that C limitations got in the way. Mixing void data pointers and void function pointers causes problems, although I do it in a couple places. I could suppress a couple warnings by disguising this with a dummy function and a union.
- This program makes extensive use of pointer arithmatic. Forget about porting it to a weak language such as Pascal (any implementation, if possible, would be highly compiler specific).
Bugs
This code has not been carefully checked for buffer overflows.Code is mostly reentrant, but not entirely. There is at least one function that needs to be modified to be reentrant.
Code may not be 64 bit clean. At the very least, an int must be able to hold a pointer and char=8, short=16, and long=32, should be true.
The name libst is already taken, need to change library name in makefile.
This file is maintained by Mark Whitis (whitis@freelabs.com).
Senior Engineer for hire
Software Development - Electronic Design - Embedded Systems - Device Drivers - System/Network Administration and Security - Motor Control, RobotCNC - Linux/Un*x - 25+ years experience
The author of these pages is looking for a new gig.
[RESUME]
Engineers and electronic hobbyists: The new Open Symbol Project is creating open schematic symbols and PCB footprints for a variety of different CAD packages.
Mark Whitis's Website Home Page Linux Book: Linux Programming Unleashed My Resume Genealogical Data Contact Info Security About All email messages received must pass the turing test or they will be considered SPAM. If it could have been written by a machine, it was.
Under no circumstances are you to email me with questions regarding windoze, any other microsoft operating system or application, or any software which runs under any form of windoze.
*