Handling Big(-ish) Data in Postgres with Data Partitioning
TL;DR
To deal with a table that is predicted to hold a lot of data we can leverage Postgres’ data partitioning feature.
This feature allows us to split a table into smaller ones that will hold part of the data based on a rule, e.g. the customer’s name first letter, or the month of the creation date.
Thus we can query the parent table using the rule:
SELECT * FROM parent_table WHERE rule applies, e.g. created_at = ‘some date’
and get data faster.
The Problem: A database table that can get too big to be efficiently queried.
Let’s say that we have a table that would hold the result of real time web logs preprocessing of our analytics engine. The estimated input to that table would be ~50GB per week for example. So, within a year we’d have a ~2.5 TB table. Even though this is within Postgres limits (see more in Postgres Limits section at the end), running…