ENTARCO Techniques for Modeling Temporal Data
(© 2011, ENTARCO USA Inc.)
This document describes the techniques for modeling data so that the requirements for keeping track of facts of interest over time (past, present, and future) can be correctly represented and are useful for designing implementations of these requirements.
This technique for modeling temporal data described in this paper has evolved over about 20 years of innovation and practical experience. This technique has been deployed in production use in large database environments for about 15 years. In these uses, the techniques are applied to every database table as a standard convention. This technique has proven to be very beneficial in being able to develop logic using a small set of sample logic which can be used for every database access. This approach has a significant impact on improving enterprise data quality, value, and reusability; and reducing application development time and cost.
The techniques described herein are identified by a Temporal Data Types (TDT) and are sequentially numbered for identification purposes.
There are four conditions that exist that determine which technique is to be used. These conditions are:
1. Can the business data content of the entity be changed?
2. Can the entity be ended without a successor?
3. Can there be future entities?
4. Can there be new “past” entities?
The following chart specifies the conditions for each TDT described below.
TDT 1 – This technique is used to model data when an occurrence of the data can be created but it cannot be changed or ended.
There are a few objects that when their existence is recognized, their existence cannot be “deleted” or ended. Enterprise and Person are two examples of these kinds of objects. Therefore, the data that describes these objects must be modeled as dependent entity types to these objects. However, these objects get modeled as independent entity types as follows.
An ENTERPRISE has a unique identifying NUMBER that is not allowed to change and once the ENTERPRISE exists, it cannot be ended.
Note: Due to variations in the definition of “Timestamp” in various environments, the use of “Timestamp” in this article refers a “date/time” attribute that can be set by the program logic.
When a new ENTERPRISE is created, the Date and Time are recorded along with the User ID or program name that created the entity.
Any change to the information that describes an ENTERPRISE will be recorded in dependent entity types that record data over time.
Note: For illustration purposes, only the first eight characters of the “Timestamp” attribute are shown in the examples.
TDT 2 – This technique is used to model data where the data is allowed to be changed in a dependent entity type and to maintain the history of previous attribute values or relationship memberships.
In this case, the CREATE_DATE_TIME is recognized as the date and time that the fact was “effective”, i.e. valid or true. For cases where the EFFECTIVE DATE TIME is, or may be, different than the CREATE_DATE_TIME, see TDT’s 3 thru 6.
(Business Data Rule: ENTERPRISE always ASSIGNED One (at a time) ENTERPRISE_REGISTRATION_STATUS_TYPE)
When the ENTERPRISE REGISTRATION STATUS for the ENTERPRISE changes, a new entity with the latest ENTERPRISE REGISTRATION STATUS TYPE is created. The previous entity is updated with the DEACTIVATE DATE TIME set to the new entity’s CREATE DATE TIME and the DEACTIVATE USER CODE is set to the new entity’s CREATE USER CODE.
Specifying the ENTERPRISE NUMBER and a comparison date time will retrieve the ENTERPRISE REGISTRATION STATUS for an ENTERPRISE. The comparison date time will be between the CREATE DATE TIME and the DEACTIVATE DATE TIME where the comparison date is greater than or equal to the CREATE DATE TIME and (the DEACTIVATE DATE TIME is null or (the DEACTIVATE DATE TIME is not null and the comparison date is less than the DEACTIVATE DATE TIME. A comparison date time of 6/15/2003 will retrieve a status of “Complete” and a comparison date time of 4/30/2003 will retrieve a status of “Incomplete”.
TDT 3 – This technique is used to model data where the requirement is to record changes in dependent entity types, to be able to represent effective date ranges, maintain the history of previous attribute values or relationship memberships, but future effective dates are not allowed, and the data cannot be ended without a successor.
This example records changes in ENTERPRISE ECONOMIC ACTIVITY STATUS which is a dependent entity type of ENTERPRISE that records changes in relationship membership with an ENTERPRISE ECONOMIC ACTIVITY STATUS TYPE. The EFFECTIVE DATE TIME records when the ENTERPRISE ECONOMIC ACTIVITY STATUS became effective. An ENTERPRISE must have an ENTERPRISE ECONOMIC ACTIVITY STATUS TYPE assigned, so there is no END DATE TIME.
Because the status is not allowed to end, this example will require system logic to ensure continuity across time from when the status is first effective onward. The EFFECTIVE DATE TIME and the SUPERCEDED DATE TIME are used to record the effective date range of the information. When a change is recorded, the previous outdated information will have its SUPERCEDED DATE TIME set to the new EFFECTIVE DATE TIME. In the most recent occurrence, the SUPERCEDED DATE TIME will have a NULL (empty) value. The CREATE DATE TIME and the DEACTIVATE DATE TIME function as they did in the form explained above and are used to record the audit trail of when data changed in the system.
Specifying the ENTERPRISE NUMBER, an effective comparison date time, and an audit comparison date time will retrieve the ENTERPRISE ECONOMIC ACTIVITY STATUS TYPE for an ENTERPRISE. The audit comparison date time will be between the CREATE DATE TIME and the DEACTIVATE DATE TIME and the effective comparison date time will be between the EFFECTIVE DATE TIME and the SUPERCEDED DATE TIME where the comparison date is greater than or equal to the CREATE DATE TIME and (the DEACTIVATE DATE TIME is null or (the DEACTIVATE DATE TIME is not null and the comparison date is less than the DEACTIVATE DATE TIME.
TDT 4 – This technique is used to model data where the requirement is to record changes in dependent entity types, to be able to represent effective date ranges, maintain the history of previous attribute values or relationship memberships, and future effective dates are allowed, but the data cannot be ended without a successor.
This example records changes in ENTERPRISE NAME which is a dependent entity type of ENTERPRISE that records changes in attribute value. The EFFECTIVE DATE TIME records when the ENTERPRISE NAME became effective. It is possible to receive a name change that will become effective in the future. An ENTERPRISE must have an ENTERPRISE name, so there is no END DATE TIME.
Other than allowing a future effective date, the logic for storage and retrieval is the same for this temporal data type as for TDT3 above including the requirement for additional logic to ensure continuity across time from when the name is first effective onward
TDT 5 – This technique is used to model data where the requirement is to record changes in dependent entity types, to be able to represent effective date ranges, maintain the history of previous attribute values or relationship memberships, and future effective dates are allowed and the data can be ended.
The example below for ENTERPRISE TRADE NAME is a situation where the dependent entity type for ENTERPRISE records changes in an attribute value with an effective date range that can end. It is possible for an ENTERPRISE to stop using a trade name. Instead of SUPERCEDED DATE TIME, the effective date range will be between EFFECTIVE DATE TIME and END DATE TIME.
Similar logic is used for both storage and retrieval as in the examples above. END DATE TIME has the same function logically as SUPERCEDED DATE TIME in “TDT3” above. Because a trade name can be ended, there is no need for logic to enforce continuity across time. A separate document contains the detailed information of how the system should maintain the date values to preserve the history for both the effective date range and the audit trail.
TDT 6 – This technique is used to model data where the requirement is to create the data and to be able to end its use.
This technique is most often used to model data that is used to classify and categorize other data. These entity types are normally maintained by a data administrator and populated using a utility. The only change to this kind of entity type would be to end its use.
Since there may be situations where the entity is entered by mistake, the CANCEL DATE TIME and CANCEL USER CODE attributes are used to indicate that this entity was cancelled and should be ignored. The CANCEL DATE TIME will always contain a null (empty) value unless it has been cancelled. When retrieving a list of valid status types, the comparison date time will be between the EFFECTIVE DATE TIME and both the END DATE TIME and CANCEL DATE TIME.
Canceling an entity is only allowed when it has not yet been used in an association with a dependent entity type in a relationship membership. If the entity has been used in a relationship membership by a dependent entity, then the entity should be ended instead of cancelled by setting the END DATE TIME.
In some cases, the dependent entity type will itself be used to classify and categorize other data. If this is the case, then it would be acceptable to cancel the entity as long as the dependent entity has not itself been used in a relationship membership with a downstream dependent entity type. In this situation, the dependent entity should be cancelled if it has a CANCEL DATE TIME or deactivated if it has a DEACTIVATE DATE TIME since its parent entity has been cancelled.
Similarly, if the END DATE TIME is set in this type of entity that is used to classify and categorize other data; then the related downstream dependent entity types that are used in the same manner should also have their entities ended if they are related to the parent entity that has been ended. This cascading effect of ending or canceling an entity should stop when the downstream entity type records the transactions that happen in the real world as opposed to being used to classify and categorize other data.
NOTE: In the above examples where SUPERCEDE DATE TIME, END DATE TIME, and DEACTIVATE DATE TIME they are specified as being “Optional” attributes. However, if the selected DBMS does not handle these fields correctly and you need to make them “Mandatory”, use a maximum date value such “01/01/3000” and then reset it to the actual data/time when updating a row. This approach is actually easier to code for than using the “Optional” attribute approach.
This approach to modeling temporal data is relatively new. Therefore it is not well understood so very few know how to do it, and thus it will likely be hotly debated. This often results in a rationalization, however false, that “if I don’t understand it or I don’t know how to do it, it must not be important or necessary; and therefore you don’t need it”. The underlying motivation here is that if you want something and I do not know how to do it or provide it, you do not need it.
In 2003, Chris Date, who is considered (and has been for many years; decades, as a matter of fact) one of the leading thinkers and the leading author on the subject of data modeling and database design, published a book on temporal data which was probably the first book that dealt with the issue of keeping track of facts over time. We, at ENTARCO, have been developing and maturing the techniques for modeling data to keep track of every fact over time (temporal data) for about twenty years. We have much empirical evidence of the enterprise business need, use, and value of universal and consistent temporal data to the point where the enterprise business users that now have it extensively implemented would not give it up. It would be their highest priority criteria for any data model and database.
There is also a case for saying that the data model cannot be properly normalized without providing for keeping track of every fact over time. To properly identify an attribute value, we need to know when it came into existence and/or when was it a fact, valid for use, or true.
The use of this technique has resulted in the following benefits. By applying the technique universally and consistently with a specified set of patterns:
A. It provides an enhanced capability to identify and resolve issues because you can universally
and consistently know “who did what, when” or more simply, “what happened when” which is
fundamental to identifying and resolving many operational issues.
B. It provides a very useful means to make assessments about the quality and currency of the data.
Because data can deteriorate with age, this technique provides a means to always know when the
data was created and how old it is.
C. It dramatically reduces the amount of time and effort to deal with the issues of keeping history.
The result is we always have the history. Often when history is not captured when it is available
it cannot be re-constructed.
D. It provides the ability to take accurate snapshots of the status of anything at any point in
time that provides the ability to see what was true in the past and when, what is true now, and
to the extent that future data is entered, you can see what we expect to be true in the future.
E. It dramatically facilitates the ability to time phase the data for analytical and reporting
purposes. This inherently supports classifying data about anything of interest to the enterprise
people by any time dimension such as:
1. Daily, weekly, monthly, quarterly, annually, semi-annually, bi-annually;
2. Week-to-date, month-to-date, year-to-date, quarter-to-date, inception-to-date,
3. Being able to compare day-to-day, week-to-week, month-to-month, year-to-year data;
4. Or any combination of the above.
From a more technical and operational perspective there are some very useful operational benefits of using temporal data. If all business data is kept as temporal data, using inserts where updates are only used to show that older data has been superseded by newer data, then some very costly issues are immediately resolved. Delete authority should not be granted on database tables since only insert, update and select are required. Updates are allowed to supersede previously-known knowledge. This eliminates the need for part of the RI, the second part for delete restrict. Because of this feature of the approach, then the indexes required to support delete restrict for foreign key constraints are not required. In addition, batch processes can now become background processes running in the same environment as online processes in the foreground as long as frequent commits are taken. Since business data is never updated and lost, then backups are not required before beginning batch processes and data entered by online processes will not be lost since recovery does not require restoration to a prior backup. This eliminates the need for a batch window or for separate copies of databases where online processes are only allowed to access data as of the end of the last batch processing.
Appendix A, MEA Standard Attribute Definitions
This appendix contains the Methodology for Enterprise Architecture standard definitions for the standard attributes used in keeping track of data over time. These attributes are used in combinations as specified by the Temporal Data Types (TDT) specified in the Techniques for Modeling Data Over Time document.
CREATE DATE TIME
CREATE DATE TIME is the date and time this occurrence was entered on the database by the APPLICATION USER (or BATCH JOB).
ADDITIONAL INFORMATION: This date is a system supplied CURRENT TIMESTAMP and is the date and time that the data was inserted on the database. This date is also considered to the date and time that the enterprise knew about the data unless there are other dates such as RECEIVED DATE, POST MARKED DATE required and used to acknowledge when the data was known to the enterprise.
CREATE USER CODE
CREATE USER CODE is the code assigned to the APPLICATION USER (or BATCH JOB) who created this entity.
EFFECTIVE DATE TIME
EFFECTIVE DATE TIME is the first date and time that this occurrence of *entity type name* is valid for use.
ADDITIONAL INFORMATION: Typically, the date will be displayed in the user interface without the time unless the situation requires it and when the date is entered without the time, the time will default to 00:00:00.
END DATE TIME
END DATE TIME is the last date and time this occurrence of *entity type name* is valid for use.
ADDITIONAL INFORMATION: Typically, the date will be displayed in the user interface without the time unless the situation requires it and blank if the value is null (or equal to a maximum date/time) value. It the date entered by the user is greater than the CURRENT DATE, the time will default to null (or a maximum date/time) value for the date. If the date entered is equal to the CURRENT DATE, then the user is asked “Do you wish to end this entity immediately?” If the response is “Yes”, then END DATE TIME is set to the CURRENT TIMESTAMP. If the response is “No”, the time will default to null (or a maximum date/time) value.
DEACTIVATE DATE TIME
DEACTIVATE DATE TIME is the system date and time this occurrence became not valid for use.
ADDITIONAL INFORMATION: In a new entity, DEACTIVATE DATE TIME will be set to a null (or maximum date/time) value. When an entity is superceded by a new entity the DEACTIVATE DATE TIME in the old entity will be updated to the value of the CREATE DATE TIME in the new entity.
DEACTIVATE USER CODE
DEACTIVATE USER CODE is the code assigned to the APPLICATION USER (or BATCH JOB) who deactivated this entity.
SUPERCEDE DATE TIME
SUPERCEDE DATE TIME is the last date and time this occurrence of *entity type name* is in effect.
ADDITIONAL INFORMATION: This attribute is not a logical business attribute. SUPERCEDE DATE TIME is used in lieu of an END DATE TIME in an entity type that has a mandatory relationship with a parent entity type that cannot end and therefore cannot exist without an occurrence which is in effect. When the entity is created the attribute value should be set to a null (or maximum date/time) value. When a new entity is created with later information, the value of SUPERCEDE DATE TIME should be set to the value of the new entity’s EFFECTIVE DATE TIME.
CANCEL DATE TIME
CANCEL DATE TIME is the system date and time this occurrence was cancelled by the APPLICATIONS USER (or BATCH JOB).
ADDITIONAL INFORMATION: In a new entity, CANCEL DATE TIME will be set to a null (or maximum date/time) value. Cancelled means that the entity was created in error, should not have existed and should not ever by displayed as valid value. This attribute can only be used if the entity has not been referenced in a relationship to another entity. If the entity has been referenced by another entity, then the entity can only be ended by using the END DATE TIME attribute.
USER CODE is the code assigned to the APPLICATION USER (or BATCH JOB) who cancelled this entity.
Douglas T. Erickson
President, ENTARCO USA Inc.
© 2011, ENTARCO USA Inc.