For quite some time, I'm playing with this idea of comparing data with some raw material (like timber) and information with some product made from that raw material (like furniture). This helps to suggest (a) that data and information are related but not the same thing and (b) that there is processing inbetween similar to converting wood to furniture. With this blog, I like to throw this comparison out into the public, also to hopefully trigger a good discussion that either identifies weaknesses of this comparison or develops the idea even further.
So let's picture the process on how a tree becomes a cupboard, a book shelf or a table:
- Trees grow in the forrest.
- A tree is cut and the log is transported to a factory for further processing.
- At the factory, it is stored in some place.
- It is the subsequently processed into boards. Various tools like saws, presses etc. are used in this context.
- The boards are frequently taken to yet another factory that applies various processing steps to create the furniture. Depending on the type of furniture (table, chair, cupboard, ...), a high or a small number of steps, complex or less complex ones are necessary. Additional material like glass, screws, nails, handles, metal joins, paint, ... are added. Processing steps are like cutting, pressing, painting, drilling, ...
Now when you consider what happens to data before it becomes useful information displayed in a pivot table or a chart then you can identify similar steps:
- Data gets created by some business process, e.g. a customer orders some product.
- For analysis, the data is brought to some central place for further processing in a calculation engine. This place can be part of a data warehouse or of an on-the-fly infrastructure, e.g. via a federated approach that retrieves the data only when it is needed.
- At this central place, it is stored in a central place, e.g. persistent DB tables or a cache, where it can potentially "meet" data from other sources.
- Data is reformated, harmonised, cleansed, ... using data quality, data transformation tools or plain SQL. Simply consider the various formats for a date like 4/5/2011, 5 Apr 2011, 5.4.11, 20110405, ...
- Data is enriched and combined with data from other sources, e.g. the click stream of your web server combined with the user master data table. Only in this combination you can, for instance, tell how many young, middle aged or old people look at your web site. In the end, data has become useful information.
No comments:
Post a Comment