Table-Oriented Programming (TOP for short) can be characterized as a programming language and/or development method that makes dealing with tables, lists, indexing, sorting, searching, filtering, etc. a native and direct part of language design and complexity management. This is a contrast to the clumsy collection API's and attribute management techniques such as set/get made popular by object oriented programming vendors. Table-Oriented Programming can also be characterized by using tables to organize program logic, not just data. Such tables are called Control Tables. They offer potential organization benefits over both raw procedural programming and object oriented programming.
Most general-purpose languages use API-like constructs (function library calls) and SQL to deal with tables. We believe that this approach is too bulky, code intensive, and formal to be used often. Pushing into, Pulling out of, and converting data for API's and SQL is not very practical. (Some OOP languages do not call them API's but use something that is essentially the same.)
For example, most languages have special math-handling syntax for dealing with mathematical equations. Example:
a = (b * c) + e + fNow, if your only choice was to use API's, then you would have to use syntax like:
a = plus(plus(times(b,c),e),f) // silly exampleOr, in OOP-ish syntax:
a = ((b.times(c)).plus(e)).plus(f) // sillierOr, as an OOP purist:
a = ((b.math.times(c)).math.plus(e)).math.plus(f) // silliestIt would of course be silly to force math experts to use such syntax; yet the equivalent is being done to database and table developers. This API-like approach is fine for occasional use, but if 70% of your applications dealt with math, this would get a bit cumbersome. We have special constructs and syntax for math, why not tables? Most custom business applications use or need far more table handling than math. Perl is the king of strings, Java is the king of networking, C is the king of speed, we now need a king of tables. (SQL and MS-Access fall short of the title).
The market focus on Object Oriented Programming has left table-handling in the dust with regard to innovation. Sorted tables and lists are actually very useful for dealing with complex data structures including trees, stacks, queues, etc. Also, tables are not limited by the size of RAM, unlike traditional data structure algorithms. They provide built-in virtual memory.
Most custom business applications are very table intensive. Accounting, inventory, reservation systems, order tracking, etc., are common examples. Also, file and directory lists, E-mail lists, sales prospects, and even lines and circles in a drawing program can be represented with tables. Yet, the languages usually used, such as C++ and Visual Basic, use nothing more than API's to work with tables. These languages encourage people to use in-memory constructs rather than ordered tables. Sad.
Although SQL is a high-level language that is quite powerful for certain types of operations, it is far from a general-purpose table processing language. Many programmers end up writing "speggitti-SQL" because the alternative is to use annoying API calls or convert to data cursors. SQL is also a poor match for interactive programs because it is more of a batch-processing and query processing language.
SQL's set-oriented processing approach is often just not appropriate for many situations. SQL also has an annoying nest-happy LISP-like structure, which makes breaking down the logic into manageable chunks tough, especially for multi-joins. Using cursors can sometimes help, but they are far from standardized, given low vendor attention, and often not given "native" or direct access to the data engine.
SQL also cannot use functions and methods that are in the calling program; you have to use SQL's built-in or external functions. SQL puts a wall between you, your code, and the data. In addition, SQL does not support named fields sets, which will be described later. (More on SQL and stored procedures.)
TOP languages do exist in various levels or incarnations of Table-orientedness. These include Xbase derivatives (dBASE, FoxPro, Clipper), PAL (Paradox), Power-Builder, Perl (for certain list types), Progress, Oracle's PL/SQL, and Clarion Developer. (We will not necessarily vouch for the quality or design of these languages, only say that they have a table-tilt to them.) These languages get little press compared to big OOP languages. Also, when upgrades are built for them, OOP features get most of the development resources, and their TOP features are treated as second priority by the vendors now.
Why does OOP get 20 times more attention than TOP? We are not saying that TOP should be everything, but it does not deserve to be ignored. Being that tables are common and powerful, TOP does not deserve only 5% of the amount of attention that OOP gets. We only ask for balance, not an overthrow.
My exposure to TOP started back in the late 1980's when I purchased a dBASE III book. I quickly fell in love with dBASE and later its XBase derivatives. (dBASE was not the first language I learned, nor was it the first that I used in a commercial setting.) It made working with relational tables such a snap that I started to view ALL collections as XBase tables. (Collections are any set of similar or closely related items.) This even began including program logic. (After all, OOP subclasses are simply a collection of related classes.)
Other languages tended to use different "containers" within the same language for collections. Such containers include arrays, vectors, dictionaries (associative arrays), and API/object interfaces to SQL database engines. The problem with these is that they are very limited and very different from each other. If the needs of a collection grew beyond the bounds or features of one of these structures or significantly changed in requirements, then switching to another "type" of collection container or re-creating the needed collection feature by hand was a pain in the [beep], let alone darn illogical.
It seemed much more logical to me to have ONE kind of interface into ANY collection and then hook up an easy-to-use set of standard collection operations that would be available to ALL collections big and small. (Not all engines will support all features, but the idea is to switch engines to get needed features, and not your existing API calls.) Although it has some annoying limitations and language weaknesses, XBase opened my eyes to table-oriented thinking.
OOP and other fads and trends prevented this powerful view of collections from progressing any further, and even reversed it to some extent. SQL as an interface is fine for formal transactions, but is too bulky and formal for many types of collection manipulations. Thus, I am here trying to sell the dream and vision of perhaps what should be called "collection-oriented-programming." I found it a more powerful metaphore than anything else on the market, and I hope you will too.
Table- Spec. | Field-Name | Field-Title | Pre-Func. | Post-Func. | Groups | Sort-1 | Pick-Func. | Total- able |
---|---|---|---|---|---|---|---|---|
Customers | CustName | Customer Name | {none} | {none} | R | 10 | No | |
Purchases | PurchDate | Purchase Date | vdate1() | dateFmt1(2) | B,R | 20 | {none} | No |
Trans | Amt | Purchase Amount | B,R | 30 | {none} | Yes |
Breif Table Legend:
Table-Spec. - Table or field-set specifier. (Fields can be virtual.)
Field-Name - Abbreviation for field name.
Field-Title - Descriptive field title
Pre-Func. - Pre-validation function. Similar to an OOP constructor.
Post-Func. - Post-validation function. May also perform formatting for display.
Group - Groups that field belongs to. (There are many ways to represent these).
Sort-1 - 1st Sorting order of fields as displayed on table and reports. (May have other sorts.)
Pick-Func. - Function called if user double-clicks on field.
Total-able - 'Y' if field can be totaled on a report.
(Note that a Data Dictionary can have many more
columns than shown and can be organized in
different ways.)
Data dictionaries (DD's) are sort of a table describing a table(s). A DD differs from a common table structure list in that it may apply to more than one table, and it can also assign functions or behavior to handle common or related operations. DD's are often described as only a documentation tool in some literature; however, we are extending or allowing them to also be used for the centralized storage of field-related properties and/or operations actually used in software.
Under ideal conditions, the DD provides enough information to generate input screens, multi-row grids, and reports without programming these from scratch. It keeps all logic related to a field or column in one central place. (Similar to the goal of an OOP class or subclass.) It is much easier to find and change information in DD tables than hunting through saparate modules or subclasses in program code. DD's are not intended to replace all program code, just reduce the need for it except down at the true customization level where it belongs.
See an
actual data dictionary
for more examples and specifics.
Note that the linked examples don't need to contain programming
code and function calls to be effective. Putting programming
code in tables is simply one TOP technique among many,
but not a prerequisite.
field1.property1 = x field1.property2 = x field1.property3 = x ...etc... field1.property29 = x field1.property30 = x field2.property1 = x field2.property2 = x ...etc... field49.property1 = x field49.property2 = x field49.property3 = x ...etc... field50.property30 = xI see these constructs all over VB and Java code. A construct like this is crying out for a tabled alternative when you have several dozen fields and several properties/functions. If you have 4 tables with 20 fields each, and each field averages 15 used properties, then you would have to write about 1,200 lines of code. (4 x 20 x 15) However, this could be converted into a table that is about 80 by 20 in cell dimensions (we are assuming that there are a total of 20 properties and/or functions). The 2D nature of tables makes them much more compact and logical for representing similar information. (This applies to control tables as well as DD's.) Code that repeats similar, but slightly different constructs or assignments over and over again is sometimes called "comb" code, or "E" code because of it's repetitous appearance. (Stacked E's resemble a comb.)
Allowing all such tables to share one or few DD's makes maintanence much easier. Plus, tables from different systems can be accessed without having to convert to or from its native DD's.
An ITOP application should make it easy to physically separate the program code, data dictionary, and actual tables if so desired. An option to jam them altogether like MS-Access prefers should also be given.
In the DD example, the Table-Spec column allows asterisks to indicate that the Field-Name will be used to find the appropriate entry. For example, several tables may have a CustName field in them. Rather then creating an entry for each table, an asterisk is put in the TableSpec column to serve as a wild-card.
For example, if a certain action happens when a field is double-clicked, the data dictionary should be able to have a new column to enter the snippet or function call for each field upon double-clicking. (This example assumes that double-clicking is not already part of the minimum standards.)
The pre-validation function serves two purposes. First, it checks the data to see that it is correct, and second, formats the field data for storage. For instance, a date may be input as "12/31/1998". The pre-validation function may change it to "19981231" before storing it in the actual table. If the user entered "12/32/1998", then the function would return a value of 'false' indicating an error. The function may resemble this psuedo code:
Boolean Function Vdate1() boolean status = true // initialize yearpart = substr(curfld,7,4) monthpart = substr(curfld,1,2) daypart = substr(curlfd,4,2) if not between(monthpart,"01","12") _ or not between(daypart,"01","31") then status = false curmsg = "Bad month or day number" else curout = convert2yyyymmdd(curfld) endif return(status) // true if passed End FunctionNotes: Curfld, Curmsg, and Curout pre-assigned variables. Curfld is the current field as entered by the user. Curmsg is the error message given to the user if the validation fails (a default is assigned if not programmed), and Curout is the field re-formatted for storage. The ITOP system automatically prepares and uses these variables before and after the function is triggered by user or batch actions. Another such reserved variable may be the length of the native string. This variable assignment method is only one possible approach to pre-validation routines; depending on the programming language, it may be better to pass these as function parameters instead.
Post-validation routines re-format the input for display. There is no true/false return value since it was already checked during input. Therefore, the return value will be the reformatted field. For example, if the stored value is "19981231", then the post-validation function can turn it into "12/31/1998". In short, the post-validation function makes the output prettier or easier to read. The example above uses Datefmt1(2). This sample function returns the date with years shown as 2-digits. (The function may get the original value from a Curfld-like variable as shown in the pre-validation example.)
It may seem like a pain to write pre- and post-validation functions, but remember that the same functions can be used over and over again. The inputs and outputs to these functions are generic enough that generic functions can be written for common formats like dates, phone numbers, etc. Thus, you do not have to re-invent the wheel for similar field types. (Although the programmer is expected to build all the validation functions, a pre-built set could be included in the DD kit to save steps or serve as examples.)
These operations include filtering, ordering, searching, auto caching and persistence, grouping and totaling, transferring, import and export, field/property selection, inserting, deleting, updating, and joining or relating. Click here for more details on these fundamental operations.
Fortunately, the transportation industry pretty much standardized on steering by turning a wheel with one's hands regardless of the vehicle size or task. (Well, the bike uses a bar, but close enough.) The software collections industry is not this wise yet. They still want to divide collections into things like stacks, queues, sets, dictionaries, trees, etc.; letting short-lived operational needs drive the protocol chosen. Collection needs change and grow over time repeatedly in my experience. Thus, one should pick a flexible collections protocol. Once a stack always a stack? Nooooo waaaay. It may continue still acting as a stack for some operations, but often will need other views as well.
These ceilings are usually either complexity ceilings or size ceilings (such as RAM). Let's look at a common Perl approach and then some SQL problems that tend to be ceiling bound.
Perlers often use lists of lists and/or pointers to lists
to store and process collections. Perl "associative
arrays" are basically a RAM table with 2 columns and
one index (to the "key" column).
If the requirements suddenly change, such as
the need for 3 columns, or 2 indexes with persistence, one then
has to completely revamp the way fields and/or indexing
is done. Perlers usually add a second level of complexity
in the form of a list of pointer or a list of lists.
In ITOP, or even XBase, these additions would be dirt simple.
There is nothing magic about the limit of 2 columns and
one index, so why does Perl and array-centric
thinking impose this arbitrary limit?
Note that I have proposed using associative arrays elsewhere
quite a few times. This may seem like a contradiction. However,
those uses are generally an interface mechanism and not
data collection management.
Although I find pointers to pointers
nasty and error-prone to work with in almost any form, let
us just assume that this approach is fine in some cases.
However, if the complexity of the structure, the quantity
and variety of operations keeps growing, or the size of such
structures increase beyond a certain amount; then the
typical response is to use a more powerful relational
database add-in.
Aside from the fact that DB API's can be bureaucratic to work
with, one has to convert the native pointer structure and much
of its processing into something the DB API's can use.
Thus, there are roughly 3 different kinds of interfaces one has to use as a collection graduates from simple to middle-level to complex:
I see no reason why the same basic interface cannot be used from baby collections to Oracle-size collections. Why the industry tolerates this, I have no idea. Perhaps because they have not seen collections done right.
Note that there may be some minor setting differences as collections scale. For example, transaction markers and concurrency error handling may need to be specified for the higher-end collections. However, these can be treated as additions to the existing code, not overhauls.
Now let's look at traditional SQL operations. SQL is usually fine for fairly simple processing stuff. However, as the number of expressions, links (joins), and/or fields increase; SQL can get nasty at times. Standard SQL lacks many block-box (subroutines) and reference reduction (factoring) techniques found in most programming languages (and promoted as "a good practice"). In standard SQL you usually cannot assign variables, macros, subroutines, etc. to complex or repeating parts in order to break the logic and sequence down into more manageable parts. You simply end up with one big, messy string with lots of parenthesis. Beyond a certain complexity point one has to break the statement into 2 or more separate SQL statements.
Further, if set-oriented operations are no longer sufficient to handle the complexity of the job, the entire SQL statement has to be converted into a cursor-oriented approach that deals better with one record at a time. It is like having to stop, backup for several miles, and then start again on a different path. (See SQL Criticism.)
ITOP offers several techniques to avoid or reduce overhauls from complexity and size changes. The primary technique is the provision of a built-in set of standard, common, rudimentary, yet powerful collection operations (described above). Other techniques include internal-aware expression evaluation and the blurring of set-orientation versus cursor-orientation in database commands. (Set-oriented operations have some significant advantages in traditional client/server setups, however, one should have a choice, especially if the bandwidth between the client and the server is sufficient.)
"Complexity Scaling" can also be horizontal as well as vertical. For example, an API that is dedicated to a stack collection can get cumbersome if the needs grow outside of the traditional parameters of stacks. I encounter the need to use and view stacks, trees, queues, etc. in ways outside of these narrow collection "subtypes" all the time. Requirements change and your collections interface should be ready for such changes.
Besides morphability and scalability (described in previous section), another benefit is easier training. Instead of learning four or more different collection management systems, one should only have to learn a single protocol. Fine adjustments and specialized extensions can then be added on as needed (such as a Pop(x) "wrapper" function for stack-like activity).
A third advantage is that the same collection system can be used for all the different collection types and variations. Rather than build a class/code browser, an RDBMS browser, an array browser, etc.; vendors can focus on building one grand collection system and browser that does it all. It could even be modular such that you can attach different text browsing engines that highlight code keywords, etc.
Even if you disagree with my specific protocol and/or syntax proposals, the idea of a consistent collection protocol should ring through as a very logical idea.
You may notice that my rejection of strong protocol taxonomies parallels my distaste for heavy use of sub-classing, also known as sub-typing and IS-A thinking. Software engineering has over-emphasized IS-A thinking. Perhaps in some niches it has an important place, but not for custom business applications.
Types like dates and SSN's can be internally represented (stored) just fine with characters or possibly integers. For example, December 31, 1998 could be represented as "19981231". This provides a natural sort order.
Booleans can be represented with "Y" and "N" (yes and no) and blank for uninitialized. This has the advantage of adding more codes in the future if 2 turn out not to be enough. Further, I have witnessed RDBMS numeric ID numbers being changed into strings and visa verse. Being type-agnostic reduces or eliminates the code changes needed after external or data source type changes. (Fortunately, ID numbers rarely are compared with greater-than and/or less-than operators. There are drawbacks to type-agnosticism, but overall I think the benefits are greater.)
Enforcement of format can be done via validation specifiers (both built-in and custom). Fewer language types increases the share-ability and portability of data. (See also Black Box Bye Bye.)
Suppose that we had to make a report that showed customer transaction detail, but which ommited customer names for reasons of confidentiality. With our setup, we could just ask for a report on all fields in set "B" (See the Groups column above). When dealing with tables with 50 plus fields, specifying a set name is much simpler than typing 50 names or building a field loop.
t = DBopen("tableA" #direct) // open table or view seek t #where name = 'Fred' if found(t) { t.name = 'Bob' // change name from Fred to Bob } close t // '#' mark named parameters Transactional: t = DBopen("tableA" #transact) seek t #where name = 'Fred' if found(t) { t.name = 'Bob' if not save(t) { // needed or no save is done // handle error } } close t // (may not be necessary in some langs)The direct approach can be used if you know there will not be resource conflict issues with other users or processes, such as if you have exclusive access to the table. Note that although this example can be easily done with SQL, more interactive behavior would require more steps in SQL. Example:
t = DBopen("tableA" #direct) // open table or view seek t #where name = 'Fred' if found(t) { t.name = 'Bob' // change name from Fred to Bob } else { display "Error, Fred not found." } close tSQL would require one query to request the change, and another to verify the change. Either that, one qurey to see if the target is there, and another to change it.
s = "select from tableA where name = 'Fred' " t = DBopen(#sql s) if found(t) { s2 = "update tableA set name = 'Bob' " DBexecute s2 } else { display "Error, Fred not found." }In this case it is not that much more code, but in many situations it gets cluttered. Direct SQL makes it tougher to mix reads and writes, usually requiring the issuing of new queries. (Perhaps the direct approach could be implemented via SQL underneath, but not something the programmer has to concern him/herself with.)
Note that recCnt(t) > 0 could replace found(t) where recCnt is short for "Record Count". This example assumes there is only one Fred.
A flag or code in a table may say, "I do
feature X", but one does not have to bathe
in the details about how feature X is
actually implemented right then and there.
The simple "what" is not mixed up with the
complex "how". Mixing them drags them
both down. A tabular bus schedule tells when and
where the busses will be, but does not
bring up how the busses will get
there. If we mixed such information together,
few would bother to ride the bus. You would
have to slog through information or structures about diesel
combustion in order to find the arrival times.
Note that situations where putting code in
tables is either somewhat limited or best
kept separate from the control (feature
selection) information. This may depend on,
for example, if there is a common one-to-one
relationship between instances and
implementation. If implementations tend to
come from picking strategies
potentially shared by multiple instances, then putting
code in tables may not make much sense (at least
not in the same table). This is roughly equivalent to
factoring out common code into a shared subroutine
rather than repeating it for each instance. Putting
code in tables is more useful for competing with
artificial OOP examples which try to justify merging
data with behavior at almost any cost, than it is for real world use.
An application can roughly be
split into data, control information, and implementation.
I am planning on writing
more about this issue in the future. Even when I do
put code in tables, it is usually very small snippets
that call other functions/services. Thus, they are actually
a hybrid of code and strategy specifiers.
Tables are also more compact than OOP classes for viewing high-level control information because tables lay out information using 2 dimensions instead of the single (linear) dimension of OOP classes.
UNIX-based architectures stumbled upon a simple yet powerful conceptual framework: the use of files and text streams as an inter-process communication medium. This paradigm (or sub-paradigm) makes it easy to mix different languages and makes a clear and inspect-able "communications gathering point", the file/stream, regardless of how complicated or messy the algorithms and code is. One could always look at the file or stream to get a "neutral point of reference". ("The Unix Philosophy", ISBN: 1555581234. Note that I am not promoting UNIX itself. OS's are one of the few things I don't have strong opinions about, other than perhaps case-sensitivity and file systems.)
Tables provide the next generation of this concept. They provide a concise communications gathering point and can be shared by many different languages and paradigms. Making convoluted code is unfortunately much easier than making a convoluted table. Further, there is more incentive to keep them clean because non-programmers can also read tables for the most part, including your boss. And, unlike files and streams, you don't have to keep copying the same data over and over for each step; and you get concurrency.
It is like everybody going to Ebay to bargain instead of having different little bargaining rules for each store. (I hope you are getting the notion by now because I am running out of analogies.)
Potential fields are kept in a central list, meaning that each row does not necessarily have to know of the entire potential field list. If a request is made for a field that is in the potential list, but that the row does not actually contain, then a blank or empty value is returned. (If a row contains a field name that is not in the central list, that field it is not considered part of the table.)
Tables also potentially have multiple temporary and multiple permanent indexes. Indexes allow one to find specific rows or row sets without sequential traversal of the entire table.
Note that there is no necessary size or content restriction on the fields. "Types" or length limits are not a prerequisite. In fact, such restrictions can make certain operations harder to implement. However, in practice, such limits are often imposed on a product for performance reasons or to fit a given standard, such as SQL.
Another way to identify or define "tables" is by their
operations rather than structure.
Click here to read more about possible merges.
Although there are many similarities, there are some differences which are probably irreconcilable:
Also note that current implementations of OOP tend to be much more memory-centric than ITOP (collections presumed to reside fully in memory instead of disk), however, this appears to be out of tradition rather than an inherent aspect of OOP.
What if all subclass code was in Control Tables instead of just stubs and expressions? What if all collections were represented and linked within SCO? What if the concepts of Control Tables and SCO were tightly combined? How does the world look if you start to view or model it as collections based on SCO. Questions like these have not be explored very well yet.
Taking concepts to the extreme many not by itself produce practical results, but can often trigger new ways of thinking. I believe there is plenty of room for brainstorming. It is hard to believe that OOP (by itself) is the pinnacle of paradigms.
OOP has focused on the complexity of individual objects, but has generally neglected the relationships between numerous similar objects. When PC's took over many mainframe tasks, the complexity of the PC got all the attention. However, now the market is again focusing on the relationship between all these PC's. This is part of what made the Internet and intranets all the rage. A powerful, isolated PC was of limited use if it could not share and get data easily. In a similar vain, TOP is an attempt to look at the connectedness of objects again rather than just fat, powerful, but very isolated objects. OOP objects are at tad too lonely.
Apologies: I realize that this web-page is a bit unstructured. It was built up piecemeal over time and needs an organization overhaul. That task is on my to-do list.