The difference between a data warehouse and a transactional database

 Intro

This answer will layout what exactly a data warehouse is and explain how it different from transactional databases. It will also examine the role of the both the physical and logical structure of databases. 

 

Data Warehouse general info

A data warehouse is a database type  also known as an enterprise data warehouse. It’s primarily purpose is to be a structured place to store data and for most organisations it is the reporting system of information with the use of internal and external data. The data warehouse is designed for centralising the database to optimise analysiss. This means that Data warehouses should not conduct transactions and therefore should be read only. But unfortunely they are often used as transaction databases by firms. Data warehouses are such a specialised piece of software that they tend or at least should be their own project within an organisation.

 

A data warehouse gets its information from both external internal sources. External sources include competitor analysis, customer sentiment analysis on social media, public data like newspapers and practitioners like Gartner and the Economists. Internal data will include ERP, SCM and CRM systems. This internal data if often known as operational or transactional data and is processed in the production system of a database. Data from both internal and external sources is processed and distributed across the database by the Extract, form and load system. This is an intelligent system also known as the staging area that can segment and organise data into detailed data, summary like data and metadata. Once information is not needed on a regular basis is not needed anymore it can be achieved. This data can be still retrieved but it takes longer. In exchange the system runs queries faster.

 

Differences between transactional databases and data warehouses

A data warehouse has a production system while a transactional database dies not. 

the production system is the transactional system that comes into the database. The production system is the part of the database that processes these transactions. A data warehouse is a tool that isolates the production system from the reporting system  The production system is optimised to do this by the use of normalised data often within a database. This format of normalised data can be make some reporting very slow making the production system a poor place to be conducting and producing highly analytical reports. Intense data mining and reporting would likely be very slow if it worked at all. The production system already has to do with transactions such as sales, staff information and website generated information. This is vital because if they were not  the system would be overloaded and overworked to the system could be compromised trying to complete certain activities.

 

On their physical layer transactional databases tend to have normalised data. Normalised data is as followed 

-       Optimised for transactional data

-       Less redundancy by avoiding data duplication. 

-       Commonly repeating data gets its own entity 

-       Usually reduces the about of storage space needed. 

-       The more joins you have the longer the query will take to run. 

 

While data warehouse tend to use a different data schema. This schema is known as a Star schema. 



 

This schema is great for querying data from a database as the data structure is very intuitive. In the middle, each star schema has a central fact table. This table works almost as an associative entity as it holds the primary keys of the other entities. All the transactional information from these other tables also known as dimensions is fed into this central fact table. Above these central entities are “stock fact” “sales fact”

 

Each one of the entities/dimensions that reach out from the central fact table has its own attributes and give further context to the fact table. For example, all the countries in the database or sales in a particular timeframe. These dimensions tend not to be normalised. Therefore, they are denormalized. This method is the best choice if speed is the primary concern of users. 

 

Star schemas are somewhat limited as each table is only interested in one type of fact, eg inventory level. This means a star schema may need to have lot of extra tables and often repeating data to make lots of different queries of data possible. 

 

Following on from tha, is a reverse star. In fact, it could be argued that it is simply further separating information compared to the star schema. Again this is a schema found in a data warehouse and would not be foundin a relational, transactional database. 

 

To divide info up even further, the structure is “reversed” or flipped. Like a star schema, the reverse star still has a central dimension that can be seen as a point of reference for all other dimension in the star. The difference comes in the fact that this central dimension has many fact tables coming out of it, therefore the fact table is no longer the main or central point of the database. Each fact table then has the normal dimensions coming out of it found in a star schema. The benefit of this schema over a standard star schema is that it facilitates ever complicated queries.

 

OLAP

A big feature that a data warehouse has that a transactional database does have is an OLAP Schema. The OLAP Schema also known as an Online Analytical Processing. This is a special softtware that extracts information from a data warehouse. It is a multi-dimensional model that can be seen as one big cube of information with many cubes within it all storing a different section of a dataset. 

 

There are four different methods of viewing data using OLAP. The first of these is Drill down of data. This is when the user clicks on a cube /cell, they can see it in more detail. For example, a user could go from seeing all the sales in the EU to seeing all the sales in a particular country. Secondly the is “roll up” which is the opposite to drill down. The term slicing and dicing refers a very particular part of a dataset. For example, seeing sales on a particular date rather than all the sales. Finally a pitot is a when there ios a  swap of dimensions to change what the user sees first. 

 

Summary

This answer firstly explained what a data warehouse is, and then moved on how it varies from a transactional database. It exampled how the data is stored in each of these database types explaining the different data schemas involved. Other factors that could have had more said about them include, archiving and snowflake data modelling. 

 

Comments

Popular posts from this blog

An explanation of SQL (Structure Query Language) Injections

A discussion of the inherent issues in managing concurrent accesses to a transactional database and the type of mechanisms which a RDBMS might use in defence of data integrity and to resolve potential deadlocks.

10 things you notice after coming home from abroad