Table Oriented Programming

A practical, intuitive, and consistent way to organize
and process data and algorithm collections
 
Updated 2/12/2002

Summary of Primary Table-Oriented-Programming Concepts

  1. Control Tables - A way of organizing processing, decision, and attribute logic.

  2. Table-Friendly Syntax - Syntax and language constructs that make dealing with tables and T.O.P. easier and more natural. This includes overcoming the weaknesses of SQL.

  3. Data Dictionaries - Special Control Tables for storing processing, decision, and attribute logic for relational database fields and/or UI fields.

  4. Fundamental and Consistent Collection Operations - A base set of operations (interface) that all collections (tables, trees, stacks, lists, etc.) should have easy or built-in access to regardless of a collection's current size or complexity. (Arrays are evil! Arrays are the Goto of the collections world.)

  5. Code Management - Relational tables are a potentially much more sophisticated tool for managing complex and multi-faceted collections of programming code than OO classes or files.

  6. Database Engine Neutrality - A T.O.P. system should be able to access a wide variety of database engines. There are some practical limitations to this, but the goal should be kept in mind.

  7. Memory-Mapping Reduction - The goal of reducing or eliminating the need to manually map and/or transfer memory variables to and from table fields and to and from the UI (screens). (This process should be invisible to the programmer regardless of the fact that internal implementation usually uses memory-based copies.)

  8. File Directory Management - Hierarchies are too narrow in scope and too restrictive. It is time for multi-aspect thinking. One search key (the hierarchy) is not enough. (Link)

Most of these concepts can be summed up nicely
by the concept of Collection Convergence. See also Yin and Yang.

Table-Oriented Programming (TOP for short) can be characterized as a programming language and/or development method that makes dealing with tables, lists, indexing, sorting, searching, filtering, etc. a native and direct part of language design and complexity management. This is a contrast to the clumsy collection API's and attribute management techniques such as set/get made popular by object oriented programming vendors. Table-Oriented Programming can also be characterized by using tables to organize program logic, not just data. Such tables are called Control Tables. They offer potential organization benefits over both raw procedural programming and object oriented programming.

Most general-purpose languages use API-like constructs (function library calls) and SQL to deal with tables. We believe that this approach is too bulky, code intensive, and formal to be used often. Pushing into, Pulling out of, and converting data for API's and SQL is not very practical. (Some OOP languages do not call them API's but use something that is essentially the same.)

For example, most languages have special math-handling syntax for dealing with mathematical equations. Example:

   a = (b * c) + e + f
Now, if your only choice was to use API's, then you would have to use syntax like:
   a = plus(plus(times(b,c),e),f)      // silly example
Or, in OOP-ish syntax:
   a = ((b.times(c)).plus(e)).plus(f)   // sillier
Or, as an OOP purist:
   a = ((b.math.times(c)).math.plus(e)).math.plus(f)   // silliest
It would of course be silly to force math experts to use such syntax; yet the equivalent is being done to database and table developers. This API-like approach is fine for occasional use, but if 70% of your applications dealt with math, this would get a bit cumbersome. We have special constructs and syntax for math, why not tables? Most custom business applications use or need far more table handling than math. Perl is the king of strings, Java is the king of networking, C is the king of speed, we now need a king of tables. (SQL and MS-Access fall short of the title).

The market focus on Object Oriented Programming has left table-handling in the dust with regard to innovation. Sorted tables and lists are actually very useful for dealing with complex data structures including trees, stacks, queues, etc. Also, tables are not limited by the size of RAM, unlike traditional data structure algorithms. They provide built-in virtual memory.

Most custom business applications are very table intensive. Accounting, inventory, reservation systems, order tracking, etc., are common examples. Also, file and directory lists, E-mail lists, sales prospects, and even lines and circles in a drawing program can be represented with tables. Yet, the languages usually used, such as C++ and Visual Basic, use nothing more than API's to work with tables. These languages encourage people to use in-memory constructs rather than ordered tables. Sad.

Although SQL is a high-level language that is quite powerful for certain types of operations, it is far from a general-purpose table processing language. Many programmers end up writing "speggitti-SQL" because the alternative is to use annoying API calls or convert to data cursors. SQL is also a poor match for interactive programs because it is more of a batch-processing and query processing language.

SQL's set-oriented processing approach is often just not appropriate for many situations. SQL also has an annoying nest-happy LISP-like structure, which makes breaking down the logic into manageable chunks tough, especially for multi-joins. Using cursors can sometimes help, but they are far from standardized, given low vendor attention, and often not given "native" or direct access to the data engine.

SQL also cannot use functions and methods that are in the calling program; you have to use SQL's built-in or external functions. SQL puts a wall between you, your code, and the data. In addition, SQL does not support named fields sets, which will be described later. (More on SQL and stored procedures.)

TOP languages do exist in various levels or incarnations of Table-orientedness. These include Xbase derivatives (dBASE, FoxPro, Clipper), PAL (Paradox), Power-Builder, Perl (for certain list types), Progress, Oracle's PL/SQL, and Clarion Developer. (We will not necessarily vouch for the quality or design of these languages, only say that they have a table-tilt to them.) These languages get little press compared to big OOP languages. Also, when upgrades are built for them, OOP features get most of the development resources, and their TOP features are treated as second priority by the vendors now.

Why does OOP get 20 times more attention than TOP? We are not saying that TOP should be everything, but it does not deserve to be ignored. Being that tables are common and powerful, TOP does not deserve only 5% of the amount of attention that OOP gets. We only ask for balance, not an overthrow.


My Motivation

Why am I so heck-bent on promoting Table-Oriented-Programming? Simply because I have found the table paradigm so very useful for RAD (rapid development), software organization, and flexibility. Yet the IT market focused on technologies like Object Oriented Programming that made for better brochures and airline magazine articles instead of real and practical benefits.

My exposure to TOP started back in the late 1980's when I purchased a dBASE III book. I quickly fell in love with dBASE and later its XBase derivatives. (dBASE was not the first language I learned, nor was it the first that I used in a commercial setting.) It made working with relational tables such a snap that I started to view ALL collections as XBase tables. (Collections are any set of similar or closely related items.) This even began including program logic. (After all, OOP subclasses are simply a collection of related classes.)

Other languages tended to use different "containers" within the same language for collections. Such containers include arrays, vectors, dictionaries (associative arrays), and API/object interfaces to SQL database engines. The problem with these is that they are very limited and very different from each other. If the needs of a collection grew beyond the bounds or features of one of these structures or significantly changed in requirements, then switching to another "type" of collection container or re-creating the needed collection feature by hand was a pain in the [beep], let alone darn illogical.

It seemed much more logical to me to have ONE kind of interface into ANY collection and then hook up an easy-to-use set of standard collection operations that would be available to ALL collections big and small. (Not all engines will support all features, but the idea is to switch engines to get needed features, and not your existing API calls.) Although it has some annoying limitations and language weaknesses, XBase opened my eyes to table-oriented thinking.

OOP and other fads and trends prevented this powerful view of collections from progressing any further, and even reversed it to some extent. SQL as an interface is fine for formal transactions, but is too bulky and formal for many types of collection manipulations. Thus, I am here trying to sell the dream and vision of perhaps what should be called "collection-oriented-programming." I found it a more powerful metaphore than anything else on the market, and I hope you will too.


Ideal Table Oriented Programming (ITOP) Features

We have looked at table-intensive processes and found a common set of features that would enhance TOP features of existing languages. We call these features "Ideal Table Oriented Programming" because not all are found in existing TOP languages. These features are to TOP what Inheritance, Polymorphism, and Encapsulation are to OOP. (In fact, ITOP shares many of these OOP aspects.)

Data Dictionaries

First, we will present an example data dictionary portion. Data dictionaries are an important concept to ITOP. We will refer to parts of it below to explain certain concepts.

Data Dictionary Sample (simplified)
Table- Spec. Field-Name Field-Title Pre-Func. Post-Func. Groups Sort-1 Pick-Func. Total- able
Customers CustName Customer Name {none} {none} R 10 custProfl() No
Purchases PurchDate Purchase Date vdate1() dateFmt1(2) B,R 20 {none} No
Trans Amt Purchase Amount preDollar() postDollar( "###,###.##") B,R 30 {none} Yes

Breif Table Legend:
Table-Spec. - Table or field-set specifier. (Fields can be virtual.)
Field-Name - Abbreviation for field name.
Field-Title - Descriptive field title
Pre-Func. - Pre-validation function. Similar to an OOP constructor.
Post-Func. - Post-validation function. May also perform formatting for display.
Group - Groups that field belongs to. (There are many ways to represent these).
Sort-1 - 1st Sorting order of fields as displayed on table and reports. (May have other sorts.)
Pick-Func. - Function called if user double-clicks on field.
Total-able - 'Y' if field can be totaled on a report.
(Note that a Data Dictionary can have many more columns than shown and can be organized in different ways.)

Data dictionaries (DD's) are sort of a table describing a table(s). A DD differs from a common table structure list in that it may apply to more than one table, and it can also assign functions or behavior to handle common or related operations. DD's are often described as only a documentation tool in some literature; however, we are extending or allowing them to also be used for the centralized storage of field-related properties and/or operations actually used in software.

Under ideal conditions, the DD provides enough information to generate input screens, multi-row grids, and reports without programming these from scratch. It keeps all logic related to a field or column in one central place. (Similar to the goal of an OOP class or subclass.) It is much easier to find and change information in DD tables than hunting through saparate modules or subclasses in program code. DD's are not intended to replace all program code, just reduce the need for it except down at the true customization level where it belongs.

See an actual data dictionary for more examples and specifics. Note that the linked examples don't need to contain programming code and function calls to be effective. Putting programming code in tables is simply one TOP technique among many, but not a prerequisite.
 

The End of Linear Paradigms

Data dictionaries greatly reduce the need for bulky field specifications often used in OOP:
   field1.property1 = x
   field1.property2 = x
   field1.property3 = x
   ...etc...
   field1.property29 = x
   field1.property30 = x
   field2.property1 = x
   field2.property2 = x
   ...etc...
   field49.property1 = x
   field49.property2 = x
   field49.property3 = x
   ...etc...
   field50.property30 = x
I see these constructs all over VB and Java code. A construct like this is crying out for a tabled alternative when you have several dozen fields and several properties/functions. If you have 4 tables with 20 fields each, and each field averages 15 used properties, then you would have to write about 1,200 lines of code. (4 x 20 x 15) However, this could be converted into a table that is about 80 by 20 in cell dimensions (we are assuming that there are a total of 20 properties and/or functions). The 2D nature of tables makes them much more compact and logical for representing similar information. (This applies to control tables as well as DD's.) Code that repeats similar, but slightly different constructs or assignments over and over again is sometimes called "comb" code, or "E" code because of it's repetitous appearance. (Stacked E's resemble a comb.)

Optional Data Dictionary

Although data dictionaries are very powerful, they should be optional. This is because DD's are a bit formal and take some effort to build, just like any good organizational paradigm. You should be able to generate a quick data table in a program without having to fill out a DD. Not all tables and lists require high levels of formality, especially if there are only a few fields. ITOP does not focus on just large or just small tables. Tables may be quick, ad-hoc array-like structures, billion-record IRS transactions, or something in-between. DD's should not be shoved down one's throat.

Detached Data Dictionary

In addition to being optional, DD's should not be built into the table file itself. This is where Microsoft Access goes wrong. DD's cannot be shared as easily if there must be a one-to-one relationship with each table. (One-to-one DD's can still be built if desired.) For example, sometimes the same or similar structures and operations are used with many different tables.

Allowing all such tables to share one or few DD's makes maintanence much easier. Plus, tables from different systems can be accessed without having to convert to or from its native DD's.

An ITOP application should make it easy to physically separate the program code, data dictionary, and actual tables if so desired. An option to jam them altogether like MS-Access prefers should also be given.

In the DD example, the Table-Spec column allows asterisks to indicate that the Field-Name will be used to find the appropriate entry. For example, several tables may have a CustName field in them. Rather then creating an entry for each table, an asterisk is put in the TableSpec column to serve as a wild-card.

Extendable Data Dictionary

In addition to being optional, the DD should also be extendible if needed by the application. ITOP should only expect that a certain minimum set of fields be included. The developer should be able to add fields to the data dictionary as needed.

For example, if a certain action happens when a field is double-clicked, the data dictionary should be able to have a new column to enter the snippet or function call for each field upon double-clicking. (This example assumes that double-clicking is not already part of the minimum standards.)

Pre and Post Validation Functions

The pre- and post-validation functions are a very powerful part of ITOP. They allow consistent processing of fields regardless of where they are entered or displayed. For example, the pre-validation function executes regardless of whether the data was entered in a form, a grid, or any other input approach (assuming a short-cut outside of ITOP is not used.)

The pre-validation function serves two purposes. First, it checks the data to see that it is correct, and second, formats the field data for storage. For instance, a date may be input as "12/31/1998". The pre-validation function may change it to "19981231" before storing it in the actual table. If the user entered "12/32/1998", then the function would return a value of 'false' indicating an error. The function may resemble this psuedo code:

  Boolean Function Vdate1()  
    boolean status = true     // initialize
    yearpart = substr(curfld,7,4)
    monthpart = substr(curfld,1,2)
    daypart = substr(curlfd,4,2)
    if not between(monthpart,"01","12") _
       or not between(daypart,"01","31") then
      status = false
      curmsg = "Bad month or day number"
    else
      curout = convert2yyyymmdd(curfld)
    endif
    return(status)   // true if passed
  End Function
Notes: Curfld, Curmsg, and Curout pre-assigned variables. Curfld is the current field as entered by the user. Curmsg is the error message given to the user if the validation fails (a default is assigned if not programmed), and Curout is the field re-formatted for storage. The ITOP system automatically prepares and uses these variables before and after the function is triggered by user or batch actions. Another such reserved variable may be the length of the native string. This variable assignment method is only one possible approach to pre-validation routines; depending on the programming language, it may be better to pass these as function parameters instead.

Post-validation routines re-format the input for display. There is no true/false return value since it was already checked during input. Therefore, the return value will be the reformatted field. For example, if the stored value is "19981231", then the post-validation function can turn it into "12/31/1998". In short, the post-validation function makes the output prettier or easier to read. The example above uses Datefmt1(2). This sample function returns the date with years shown as 2-digits. (The function may get the original value from a Curfld-like variable as shown in the pre-validation example.)

It may seem like a pain to write pre- and post-validation functions, but remember that the same functions can be used over and over again. The inputs and outputs to these functions are generic enough that generic functions can be written for common formats like dates, phone numbers, etc. Thus, you do not have to re-invent the wheel for similar field types. (Although the programmer is expected to build all the validation functions, a pre-built set could be included in the DD kit to save steps or serve as examples.)

Sort Orders

Data Dictionary Sort orders specify the order that fields appear on reports and screens. In our example the fields are given an order in the Sort-1 column. The DD could also have a Sort-2 column, Sort-3 column, etc.


Standard Collection Operations

A good table-oriented system gives every collection (such as tables) a standard set of operations that can be used on all tables. One is not limited just to the operations that the programmer can see in advance and explicitly builds in for a given collection. Building or adding each of these operations in one at a time as the needs arise can be very time-consuming.

These operations include filtering, ordering, searching, auto caching and persistence, grouping and totaling, transferring, import and export, field/property selection, inserting, deleting, updating, and joining or relating. Click here for more details on these fundamental operations.

No Ceilings!

Many current approaches to collection processing have practical ceilings that require arbitrary interface changes to move to the next step. When these ceilings are reached, the programmer is forced to revamp the existing code to take advantage of the next level of power. Such revamping is a waste of time and resources. (Bertrand Meyer calls this the "Continuity Problem", where a small change in the requirements results in a large change in program code.) It would be like having to steer with your elbow if on a bicycle, steer with your nose if in a car, and then steer with your foot if in an 18-wheel rig.

Fortunately, the transportation industry pretty much standardized on steering by turning a wheel with one's hands regardless of the vehicle size or task. (Well, the bike uses a bar, but close enough.) The software collections industry is not this wise yet. They still want to divide collections into things like stacks, queues, sets, dictionaries, trees, etc.; letting short-lived operational needs drive the protocol chosen. Collection needs change and grow over time repeatedly in my experience. Thus, one should pick a flexible collections protocol. Once a stack always a stack? Nooooo waaaay. It may continue still acting as a stack for some operations, but often will need other views as well.

These ceilings are usually either complexity ceilings or size ceilings (such as RAM). Let's look at a common Perl approach and then some SQL problems that tend to be ceiling bound.

Perlers often use lists of lists and/or pointers to lists to store and process collections. Perl "associative arrays" are basically a RAM table with 2 columns and one index (to the "key" column). If the requirements suddenly change, such as the need for 3 columns, or 2 indexes with persistence, one then has to completely revamp the way fields and/or indexing is done. Perlers usually add a second level of complexity in the form of a list of pointer or a list of lists. In ITOP, or even XBase, these additions would be dirt simple. There is nothing magic about the limit of 2 columns and one index, so why does Perl and array-centric thinking impose this arbitrary limit?

Note that I have proposed using associative arrays elsewhere quite a few times. This may seem like a contradiction. However, those uses are generally an interface mechanism and not data collection management.
Although I find pointers to pointers nasty and error-prone to work with in almost any form, let us just assume that this approach is fine in some cases. However, if the complexity of the structure, the quantity and variety of operations keeps growing, or the size of such structures increase beyond a certain amount; then the typical response is to use a more powerful relational database add-in. Aside from the fact that DB API's can be bureaucratic to work with, one has to convert the native pointer structure and much of its processing into something the DB API's can use.

Thus, there are roughly 3 different kinds of interfaces one has to use as a collection graduates from simple to middle-level to complex:

  1. A regular or associative array.
  2. An array of arrays (or a list of pointers) if the structure grows beyond 2 columns or 1 index. (A "doubling-up," if you will.)
  3. Relational API's when heavy persistence, concurrency, or size is needed.

I see no reason why the same basic interface cannot be used from baby collections to Oracle-size collections. Why the industry tolerates this, I have no idea. Perhaps because they have not seen collections done right.

Note that there may be some minor setting differences as collections scale. For example, transaction markers and concurrency error handling may need to be specified for the higher-end collections. However, these can be treated as additions to the existing code, not overhauls.

Now let's look at traditional SQL operations. SQL is usually fine for fairly simple processing stuff. However, as the number of expressions, links (joins), and/or fields increase; SQL can get nasty at times. Standard SQL lacks many block-box (subroutines) and reference reduction (factoring) techniques found in most programming languages (and promoted as "a good practice"). In standard SQL you usually cannot assign variables, macros, subroutines, etc. to complex or repeating parts in order to break the logic and sequence down into more manageable parts. You simply end up with one big, messy string with lots of parenthesis. Beyond a certain complexity point one has to break the statement into 2 or more separate SQL statements.

Further, if set-oriented operations are no longer sufficient to handle the complexity of the job, the entire SQL statement has to be converted into a cursor-oriented approach that deals better with one record at a time. It is like having to stop, backup for several miles, and then start again on a different path. (See SQL Criticism.)

ITOP offers several techniques to avoid or reduce overhauls from complexity and size changes. The primary technique is the provision of a built-in set of standard, common, rudimentary, yet powerful collection operations (described above). Other techniques include internal-aware expression evaluation and the blurring of set-orientation versus cursor-orientation in database commands. (Set-oriented operations have some significant advantages in traditional client/server setups, however, one should have a choice, especially if the bandwidth between the client and the server is sufficient.)

"Complexity Scaling" can also be horizontal as well as vertical. For example, an API that is dedicated to a stack collection can get cumbersome if the needs grow outside of the traditional parameters of stacks. I encounter the need to use and view stacks, trees, queues, etc. in ways outside of these narrow collection "subtypes" all the time. Requirements change and your collections interface should be ready for such changes.

See Also:
Array Problems
Taxonomy Trap

Collection Convergence

There is a pattern to most of these (above) recommendations.
convergence diagram
Right now many systems have roughly four different collections protocols or protocol types. Is there a reason for this? One may argue that different collections have different needs, and thus specialization is required. Although this is certainly true for the implementation much of the time, I find this generally not to be the case with the collection protocols themselves. The primary reason is that requirements change too often. A collection may start out as a tree or a stack, but morphs into a more general-looking collection as more requirements are needed. I have experienced this process on several occasions. (See The Collection Taxonomy Trap for more about this, and The Multi-Dispatch Pattern for some source code management ideas.)

Besides morphability and scalability (described in previous section), another benefit is easier training. Instead of learning four or more different collection management systems, one should only have to learn a single protocol. Fine adjustments and specialized extensions can then be added on as needed (such as a Pop(x) "wrapper" function for stack-like activity).

A third advantage is that the same collection system can be used for all the different collection types and variations. Rather than build a class/code browser, an RDBMS browser, an array browser, etc.; vendors can focus on building one grand collection system and browser that does it all. It could even be modular such that you can attach different text browsing engines that highlight code keywords, etc.

Even if you disagree with my specific protocol and/or syntax proposals, the idea of a consistent collection protocol should ring through as a very logical idea.

You may notice that my rejection of strong protocol taxonomies parallels my distaste for heavy use of sub-classing, also known as sub-typing and IS-A thinking. Software engineering has over-emphasized IS-A thinking. Perhaps in some niches it has an important place, but not for custom business applications.
 

Few Types

The proliferation of field types has made data more difficult to transfer or share data between different applications and generates confusion. ITOP has only two fundamental data types: numeric and character, and perhaps a byte type for conversion purposes. (I have been kicking around ideas for having only one type.) The pre- and post-validators give any special handling needed by the field. A format string can be provided for various items like dates ("99/99/99"), Social-Security-Numbers ("999-99-9999"), and so forth. (Input formats are not shown in our sample DD.)

Types like dates and SSN's can be internally represented (stored) just fine with characters or possibly integers. For example, December 31, 1998 could be represented as "19981231". This provides a natural sort order.

Booleans can be represented with "Y" and "N" (yes and no) and blank for uninitialized. This has the advantage of adding more codes in the future if 2 turn out not to be enough. Further, I have witnessed RDBMS numeric ID numbers being changed into strings and visa verse. Being type-agnostic reduces or eliminates the code changes needed after external or data source type changes. (Fortunately, ID numbers rarely are compared with greater-than and/or less-than operators. There are drawbacks to type-agnosticism, but overall I think the benefits are greater.)

Enforcement of format can be done via validation specifiers (both built-in and custom). Fewer language types increases the share-ability and portability of data. (See also Black Box Bye Bye.)

Field Groups

One of the most time-wasting process in programming table processing with many of the popular languages is having to type the names of all the fields that will show up on a screen or report. It would be much easier to specify the name of a set, and all fields belonging to that set would then be used. Although sets could be pnumonic names, we chose to use letters in our example for simplicity.

Suppose that we had to make a report that showed customer transaction detail, but which ommited customer names for reasons of confidentiality. With our setup, we could just ask for a report on all fields in set "B" (See the Groups column above). When dealing with tables with 50 plus fields, specifying a set name is much simpler than typing 50 names or building a field loop.

Other Possible ITOP Features

Tables as Concise Control Panels

Even if you don't put code directly in tables, tables make a very nice "control panel" to manage "high-level" information. Unlike OOP classes, tables usually don't mix implementation information with the "settings" of the switches and knobs. Tables usually tell what to do, not how to do it. You don't have the equivalent of OOP method code mixed in with simple attribute assignments. This is what allows tables to be shared with many different languages and paradigms. OO fans proudly call the mixing "encapsulation." However, I call it mucking up potentially simple, concise information by mixing it with nitty-gritty implementation details. OO misses a grand opportunity for separation of concerns. The information content of table-based control information is roughly 5 to 20 times more dense than algorithm code per character I would estimate. It tells more in less space.

A flag or code in a table may say, "I do feature X", but one does not have to bathe in the details about how feature X is actually implemented right then and there. The simple "what" is not mixed up with the complex "how". Mixing them drags them both down. A tabular bus schedule tells when and where the busses will be, but does not bring up how the busses will get there. If we mixed such information together, few would bother to ride the bus. You would have to slog through information or structures about diesel combustion in order to find the arrival times.

Note that situations where putting code in tables is either somewhat limited or best kept separate from the control (feature selection) information. This may depend on, for example, if there is a common one-to-one relationship between instances and implementation. If implementations tend to come from picking strategies potentially shared by multiple instances, then putting code in tables may not make much sense (at least not in the same table). This is roughly equivalent to factoring out common code into a shared subroutine rather than repeating it for each instance. Putting code in tables is more useful for competing with artificial OOP examples which try to justify merging data with behavior at almost any cost, than it is for real world use. An application can roughly be split into data, control information, and implementation. I am planning on writing more about this issue in the future. Even when I do put code in tables, it is usually very small snippets that call other functions/services. Thus, they are actually a hybrid of code and strategy specifiers.
Control Panel
The table user only sees the settings;
the implementation is hidden behind the panel.
(May your tables be simpler than this box.)

Tables are also more compact than OOP classes for viewing high-level control information because tables lay out information using 2 dimensions instead of the single (linear) dimension of OOP classes.

UNIX-based architectures stumbled upon a simple yet powerful conceptual framework: the use of files and text streams as an inter-process communication medium. This paradigm (or sub-paradigm) makes it easy to mix different languages and makes a clear and inspect-able "communications gathering point", the file/stream, regardless of how complicated or messy the algorithms and code is. One could always look at the file or stream to get a "neutral point of reference". ("The Unix Philosophy", ISBN: 1555581234. Note that I am not promoting UNIX itself. OS's are one of the few things I don't have strong opinions about, other than perhaps case-sensitivity and file systems.)

Tables provide the next generation of this concept. They provide a concise communications gathering point and can be shared by many different languages and paradigms. Making convoluted code is unfortunately much easier than making a convoluted table. Further, there is more incentive to keep them clean because non-programmers can also read tables for the most part, including your boss. And, unlike files and streams, you don't have to keep copying the same data over and over for each step; and you get concurrency.

It is like everybody going to Ebay to bargain instead of having different little bargaining rules for each store. (I hope you are getting the notion by now because I am running out of analogies.)

See Also:
Control Table Theory (1D v. 2D)
Bank Example
Block Boxes and Skinny Wires

 

Definition of "Table"

I define a relational table as a collection of "rows". Rows are basically dictionary structures with field names and corresponding values. Rows have "potential" fields. (I prefer "field" over "column" because column implies fixed field widths, which is not a prerequisite for "table".) "Potential field" means that a row does not have to actually contain a given field or even a placeholder for it.

Potential fields are kept in a central list, meaning that each row does not necessarily have to know of the entire potential field list. If a request is made for a field that is in the potential list, but that the row does not actually contain, then a blank or empty value is returned. (If a row contains a field name that is not in the central list, that field it is not considered part of the table.)

Tables also potentially have multiple temporary and multiple permanent indexes. Indexes allow one to find specific rows or row sets without sequential traversal of the entire table.

Note that there is no necessary size or content restriction on the fields. "Types" or length limits are not a prerequisite. In fact, such restrictions can make certain operations harder to implement. However, in practice, such limits are often imposed on a product for performance reasons or to fit a given standard, such as SQL.

Another way to identify or define "tables" is by their operations rather than structure.
 

Summary of Benefits of Good TOP

Here is a list of some of the benefits provided by TOP that are for the most part not OOP benefits. It may be possible to implement many of these features using OOP, but it would be a lot of work, and subject you to the risks that all function (method) libraries and wizards have. Details backing this list are found above.

  1. All collections (tables) have built-in, ready-to-use operations that can be used on them. Other paradigms often require explicit effort to build or link in such operations. Thus, ITOP cuts down on development time.

  2. Groups of fields can be chosen simply by supplying a group name, rather than coding the name of each field. (Great for tables with many fields.)

  3. The behavior of a field is controlled in one spot (DD) no matter which screen, grid, data-entry form, or report the field appears on. Other approaches require one to recode field behavior for each of these, also requiring separate changes in all four.

  4. Less reliance on WYSIWYG screen and report builders. Most data-entry and grid screens can almost build themselves based on DD information.

  5. A consistent collections handling interface regardless of collection size and complexity.

  6. Support for high, medium, and low formality tables. Other languages usually give you only one way to deal with tables and assume tables are either large and formal, or small and featureless.

  7. Easier to make software changes because only control tables (such as DD's) need be changed in many instances. It is easier to go to a table for changes than hunt around 20,000 lines program of code. A table (grid) is a much easier structure to use for viewing, comparing, and changing properties of similar "objects" than program code. This is because grids are two-dimensional structures, while program code is basically a one-dimensional structure.

  8. Field variables can be referenced easily without converting back and forth between memory variables and actual table fields. An example might be "$amount = $rate * hours" where the $ sign marks field variables ("hours" is a memory variable). Note that a "with"-like structure may make the table reference unnecessary; however, an optional table reference should be permitted. Example: "x = mytable$amount".

  9. Allows tables to hold internal function calls and expressions, not just values or SQL expressions. This helps build powerful control tables, including the DD. Note that usually an interpreter is needed for this; however, compiled languages such as Clipper allowed this. This was done by evaluating a string stored in a table.

  10. Ability to evaluate table processing calls with internal expressions and functions, not just SQL. For example, in XBase a statement like "replace all pay with CalcRate() * hours" calculates all employee paychecks in a table. In SQL this could resemble: "Update emp set pay = CalcRate() * hours". However, in standard SQL you cannot "pass" the CalcRate function to the SQL processor; you are stuck with SQL's built-in functions or proprietary "stored" procedure calls that are separate from your code. (Note that CalcRate may use many fields in a record. Thus, it cannot be "blind" to the "current" record.) Working with SQL is like having a wall that separates your code from SQL's table processing. ITOP breaks down this wall.

  11. Easy-to-use array and list structures that are not limited to memory (RAM) size. For example, automatic "temp" file buffering would happen if the array or vector grew too large for memory. Most OOP languages rely on classes and structures that are assumed and limited to reside entirely in memory. (These languages are said to be "memory-centric".)

  12. Using persistent or semi-persistent storage allows easier modulerization and testing because one does not have to load and run the whole shebang in order to test the pieces. This is because ITOP applications and components often pass information using powerful tables instead of fleeting memory constructs. Communications between components can often be via tables. These tables can exist without having all parts active in memory. In fact, dummy tables are easy to construct for testing. Recreating a bunch of memory-bound OOP structures in order to test parts can be a real pain. Even when the OOP parts are ready, it is tough to get a view of the tangled classes and structures in memory. Tables, on the other hand, are a snap to inspect during testing and troubleshooting.

  13. Less of a learning curve than OOP because tables are a more familiar structure than OOP. (OOP should really be called "Category" Oriented Programming instead because it has no improved relationship to real-world objects.) Table structures can be found in reports, spreadsheets, travel schedules, tax schedules, the Battleship game, etc. Thus, tables will always be easier to relate too and more intuitive than OOT. (I have yet to see an OO bus schedule.)

Merging T.O.P. and O.O.P.?

Perhaps the best of TOP and OOP can be combined into a very useful programming language. There are enough similarities between the two to spark new thinking. It is our opinion that OOP designers have been too caught up in building structures in memory that they neglected flexible persistent storage techniques, among others.

Click here to read more about possible merges.

Although there are many similarities, there are some differences which are probably irreconcilable:

  1. Algorithms (methods) and Data are not tightly integrated in ITOP. This tends to offer less method-to-data mapping "protection" than OOP, but results in less preprocessing and mapping of the data to be used within ITOP.

  2. ITOP uses sets and relations in place of class hierarchies. It is thus a different form of inheritance. (ITOP does not value hierarchies as highly as OOP because they are seen as too inflexible to change. Changes often do not occur in a hierarchical way.)

  3. ITOP attempts to have a built-in relationship with relational systems to reduce or minimize data mapping.

Also note that current implementations of OOP tend to be much more memory-centric than ITOP (collections presumed to reside fully in memory instead of disk), however, this appears to be out of tradition rather than an inherent aspect of OOP.

Future of TOP and ITOP

This set of documents is not meant to be the final word on TOP. There are many aspects to be explored and evaluated by different personalities with different perspectives. Concepts such as Control Tables and Standard Collection Operations (SCO) could possibly be carried to deeper levels than given here in order to see what new ideas and issues arise.

What if all subclass code was in Control Tables instead of just stubs and expressions? What if all collections were represented and linked within SCO? What if the concepts of Control Tables and SCO were tightly combined? How does the world look if you start to view or model it as collections based on SCO. Questions like these have not be explored very well yet.

Taking concepts to the extreme many not by itself produce practical results, but can often trigger new ways of thinking. I believe there is plenty of room for brainstorming. It is hard to believe that OOP (by itself) is the pinnacle of paradigms.

Summary

The features listed here are only suggestions. The point is to generate the same type of thought process and analysis that triggered the development and popularity of OOP. Tables have been given the short end of the market attention stick. It is time for the pendulum to swing back, or at least to the center.

OOP has focused on the complexity of individual objects, but has generally neglected the relationships between numerous similar objects. When PC's took over many mainframe tasks, the complexity of the PC got all the attention. However, now the market is again focusing on the relationship between all these PC's. This is part of what made the Internet and intranets all the rage. A powerful, isolated PC was of limited use if it could not share and get data easily. In a similar vain, TOP is an attempt to look at the connectedness of objects again rather than just fat, powerful, but very isolated objects. OOP objects are at tad too lonely.


Related Information and Links

Apologies: I realize that this web-page is a bit unstructured. It was built up piecemeal over time and needs an organization overhaul. That task is on my to-do list.


(All material © copyright 1998, 1999, 2000, 2001 by Findy Services and B. Jacobs unless otherwise noted.)