[Note: This white paper originally appeared internally at Microsoft on 22 July 1999.]
This document describes what is meant by “pre-oracled” data and how Semantic Test proposes to use it in search engine testing.
Some Background on Testing and Oracles
Simply stated, testing involves providing inputs to the application under test and observing the application’s outputs. We examine the outputs to verify that the application is behaving correctly. The term “test oracle,” or simply “oracle,” describes how we determine if the output we observed was correct. By “pre-oracled” data, we mean that our test engine already knows the answer we expect to receive based on the test data. (Note that “pre-oracled” is a term unique to our group and neither the term nor the concept appears in the literature.)
Some oracles are very simple. For instance, if the application crashes, that is almost always a bad thing. But checking for crashes is not enough. Most oracles need to be more sophisticated than just detecting crashes. We want some assurance that the application actually did what we expected.
When humans test, they are usually their own oracles; i.e., they typically have some idea of what they consider to be “good” application behavior. For instance, if I am editing a Word document that contains the string “spaghetti”, I will expect to find that word when I do a search of the document. Likewise, if I have deleted all instances of “spaghetti” from the document, I will expect the search to come up empty.
In short, people create their own mental model of how the application will behave. This model is based on
1. What they know how the application should behave, and
2. What they know about the data being processed.
In the Word example above, the tester knows (1) how the “Find” function of Word should behave, and (2) whether the string “spaghetti” is present in the document.
The Need for Models
One significant problem in testing is that people’s models tend to remain in their heads and are not written down. When people typically write their tests, they don’t write down the model that generated the tests, they only write down the actions to perform. This makes those tests static and hard to adapt to new situations.
For instance, if I have a file that contains the string “spaghetti”, my mental model tells me to expect a search for “spaghetti” to be successful. But if I only record my actions, my test will always search for the string “spaghetti”. To keep the result successful, I will need to ensure that “spaghetti” is always present in the file, usually by keeping the “spaghetti” document around.
Recording the actions rather than the model leads to a “hardening” of the tests. Because the oracle for this test has been preserved in the test actions and the data files, the test scripts become frozen into always testing for the same words in the same files. These tests become less useful as time goes on because they have already found the bugs they were intended to find. Also, the test framework is burdened with maintaining the data sets, such as the Word documents, that the tests were run on.
Modeling a Search Engine
Our work in Semantic Test proposes using a model of the application’s expected behavior to generate tests. For instance, suppose we are testing a search engine’s ability to locate documents that contain combinations of words. In the following example, we use a simple database to hold our simple model. The database associates document names with the words they contain. (Note that, for simplicity, this model does not care about the exact number of times nor where in the documents the words appear.)
Document |
Spaghetti |
Monkey |
Klingon |
File1 |
Yes |
No |
No |
File2 |
Yes |
Yes |
No |
File3 |
No |
Yes |
No |
File4 |
No |
No |
No |
If our model correctly portrays how these words are distributed throughout the documents, we can generate interesting tests. And querying the model in the database provides us with an automatic oracle for the output.
Example 1: Find documents that contain ‘monkey’. The oracle for this test is provided by a query on the database model: “SELECT Document … WHERE Monkey = ‘Yes’ “. The model (in the database) and the search engine should both answer with File2 and File3.
Example 2: Find documents that contain ‘spaghetti’ and ‘monkey’ is modeled by “SELECT Document … WHERE Spaghetti = ‘Yes’ AND Monkey = ‘Yes’ “. The model and the search engine should both answer with File2.
Example 3: Find documents that contain “Klingon” is modeled by “SELECT Document … WHERE Klingon = ‘Yes’ “. The model and the search engine should both reply that no such document was found.
And so on.
Growing a Model
One advantage of using models is that they can quickly provide rapid (though simple) testing of the application. And as the application and the tests evolve, it is straightforward to provide additional information to the model.
For instance, the model might eventually incorporate information about
The Primacy of the Model
One of the most interesting facets of this modeling approach is that all we really care about is the model. We don’t care how the information was put into the model. All we care about is that the model is reasonably correct for our uses and that it contains the information we need to know about how the application will behave.
All of these approaches offer benefits.