The different types of database partitioning briefly explained
First of all, the partitioning of a database means that that we are essentially dividing up a database into smaller more manageable parts. Similar to normalisation, this is the process of turning one big table into a set of smaller tables. This means that when queries take place, they have a smaller section of data to scan and retrieve data from. This effectively means that a portioning of data is like a filtering of data.
There are there 3 primary ways to conduct database partitioning.
1. Data replication
2. Horizontal Partitioning
3. Vertical Partitioning
Advantages of data replication
This is a very reliable way of making sure your data is secure. If the two data storage areas are far enough a part there is no reason why both should be impacted at the one time. This means that data runs little risk of getting deleted.
If the primary dataset is stored locally this means that data will be retrieved even faster than if we used a centralised system that could be at the other side of the world. On top of this because the data is not distributed the data should be simple quick to save. This helps with data integrity.
If the data is updated at a live rate we can be sure the data integrity of both sets of data has been maintained. On the other hand, if activity can be delayed until the end of the day, the business can avoid the network getting overloaded with high levels of traffic.
Disadvantages of data replication
Because all the data is getting stored twice, this means you need both double the storage space and potentially the possessing power and network connectivity. Weather this data is stored on physical storage onsite or cloud-based storage, this will lead to increased storage costs.
All this extra processing will mean that everything will take longer. This extra time, won’t be noticeable for small transactions but could be very easy to see the bigger the transaction.
If the data replication is not live it could lead to a lack of data integrity due to the inconsistencies in the data.
Data replication business situation
This technique should only be chosen in situations where it is essentially the data is kept safe and secure at almost all costs. The technique is very time consuming and expensive. This can put great strain on the business. Because of the costs, this technique may be best suited to the bigger firms who have constant demand for online services and have too much data to ever realistically retrieve.
From what I understand the querying of data in terms of syntax should not be any more complicated as we will still ask for data from the primary database. Queries will be impacted in their ability to return data quickly to users.
Horizontal Partitioning
This is when different rows of the database are stored in different locations. This means for example; your rows of data could be split in three. The first third of rows would be in the first table, the second set in the second table and so on.
Advantages of Horizontal Partitioning?
As expected, this means that queries for particular entries of data like a transaction should be a much faster as we are only scanning one third of the data. This means only relevant data is getting processed and is open to cyber-attack. It also means that only relevant data will be loaded. For example, the data that is divided into three could be divided that way based on location. For example, sales data from Paris, Berlin and Prague. Natural,ly someone working from Berlin would be interested in data from Berlin and so on. How this data is divided can be decided by the database administrator when they are setting up or tuning the database.
Also, because we know where the data is stored, the query should not be as complicated than if we were guessing where the data was.
Business use of Horizontal partitioning
If we are not entirely sure where data is, a lot more data than expected may need to be scanned which will slow down the processing of data. To avoid this we need the portioning key. For example, we can partition by range. If we know that a customer id is 198, we know to scan the table that holds customer id’s 1-200 and not 201-400. Each range is its own table.
There is no data replication and this could leave data very vulnerable should some tables get damaged.
What is vertical Partitioning?
This is when the database administrator restricts which columns are displayed at different locations or to different users. For example, everyone within a business may be able to look ok everyone else up by name and where what business unit they work in. But what each person earns may only be able to be seen by employees working in HR.
When might we use vertical partitioning as a technique?
The main time a business would use this technique is when they don’t want certain user types to see a column of data. This would be in the case of both employees or customers. In the same way we might not want every employee to see everyone’s wages, we would not want customers to see what our profit margin is. This is of cause the price difference between what it costs us to buy/make the product and put it on sale vs the price we sell it for to the final customer or the next cog in the supply chain.
We could also use this technique to take out data that is seldom needed. For example, if a local wool shop in Ireland sells 95% of its products to customers based in Ireland, 95% scanning the country field/attribute in a database is just a waste of time. By portioning this column, we can reduce how long a query of a database will take.
The major this advantage of this is that if the column of data that is not normally needed is needed, the query will take longer than if everything was in the one table. But this slow down is worth it as the reason this column is in its own table is because you rarely use it. The benefit of having increased query speed most of the time will outweigh the slowing down of the occasional query.
In summary, this answer broke down the 3 tyopes of data partining, looks at the advantages and disadvantages and used this as a springboard to talk out the business situations when each would be used.
Comments
Post a Comment