Databases & Data Engineering

Search Knowledge Base Articles

Database High Availability and Failover
03/02/2024 18:40:24

Database High Availability and Failover If your business depends on its database being online, a single server is a single point of failure. High availability is the practice of arranging things so that, when one component fails, another ...
Data Governance and Ownership
01/03/2024 18:31:23

Data Governance and Ownership As an organisation's data grows, so does the risk of confusion: who is allowed to see it, who keeps it accurate, and who decides how it is used? Data governance is the set of agreed answers to these questions...
Storing Dates, Times and Time Zones
26/03/2024 18:59:20

Storing Dates, Times and Time Zones Dates and times look simple but quietly cause some of the most stubborn bugs in software — a report off by an hour, a booking on the wrong day, or a deadline that shifts in summer. Handling them properl...
Sharding: Splitting Data Across Servers
30/03/2024 12:13:23

Sharding: Splitting Data Across Servers There comes a point for very large systems where a single database server simply cannot hold or serve all the data fast enough. Sharding is the technique of splitting that data across several server...
Migrations: Changing a Live Database Safely
15/04/2024 19:59:42

Migrations: Changing a Live Database Safely As your product grows, its database structure has to change — a new column here, a renamed table there. Doing this on a live system with real customer data is delicate work, which is why we use ...
Primary Keys, Foreign Keys and Relationships
06/05/2024 13:27:29

Primary Keys, Foreign Keys and Relationships Relational databases earn their name from relationships — the links that connect a customer to their orders, or an invoice to its line items. Primary keys and foreign keys are the tools that ma...
Read Replicas and Scaling Reads
26/07/2024 19:04:30

Read Replicas and Scaling Reads Many applications read data far more often than they change it: browsing a catalogue, viewing reports, loading dashboards. When those reads start to strain a single database, read replicas are a proven way ...
Connection Pooling and Why It Matters
22/08/2024 16:50:34

Connection Pooling and Why It Matters Every time your application talks to its database, it opens a connection — and opening connections is surprisingly expensive. Under load, doing this naively can bring an otherwise healthy system to it...
Avoiding SQL Injection
22/09/2024 17:41:16

Avoiding SQL Injection SQL injection is one of the oldest and most damaging web vulnerabilities, yet it remains common. It lets an attacker trick your database into running commands it should never run — reading, altering or deleting data...
Transactions and Data Integrity
18/10/2024 15:37:23

Transactions and Data Integrity Some operations only make sense if they happen completely or not at all. Transferring money between two accounts is the classic example: you must never debit one without crediting the other. Database transa...
Database Monitoring and Alerts
06/12/2024 08:45:21

Database Monitoring and Alerts A database problem caught early is a minor task; the same problem caught when customers complain is an emergency. Monitoring watches the health of your database continuously and alerts the team before small ...
Capacity Planning for Database Growth
24/12/2024 11:27:42

Capacity Planning for Database Growth Databases rarely fail without warning — they usually fill up or slow down gradually as data accumulates. Capacity planning means watching those trends and acting early, so you upgrade on your own sche...
Streaming Data vs Batch Processing
29/12/2024 13:23:10

Streaming Data vs Batch Processing Data can be processed in two broad rhythms: in large scheduled batches, or continuously as it arrives. Choosing the right one balances how fresh your data needs to be against cost and complexity. ...
Full-Text Search in a Database
20/01/2025 09:36:38

Full-Text Search in a Database Searching for a word inside a paragraph of text is very different from looking up an exact value like a customer ID. Full-text search is the feature that lets your users find records by typing natural words,...
Normalisation vs Denormalisation
19/03/2025 08:29:04

Normalisation vs Denormalisation Normalisation and denormalisation are two opposing approaches to organising data. One reduces duplication for accuracy; the other accepts duplication for speed. Knowing the trade-off helps you understand d...
ETL and ELT: Moving Data Between Systems
18/04/2025 12:10:14

ETL and ELT: Moving Data Between Systems Businesses rarely keep all their data in one place. Sales might live in one system, marketing in another and finance in a third. ETL and ELT are the standard patterns for pulling that data together...
Relational vs NoSQL Databases: Which to Choose
26/04/2025 14:37:32

Relational vs NoSQL Databases: Which to Choose Almost every application needs somewhere to keep its data, and the first big decision is whether to use a relational database or a NoSQL one. The right choice shapes how easily your product c...
What a Database Schema Is
22/05/2025 10:39:08

What a Database Schema Is A schema is the blueprint of your database — it defines the tables you have, the columns inside them, and the rules that keep your data tidy. A good schema makes everything that follows easier, while a poor one c...
What a Data Lake Is
23/05/2025 15:30:50

What a Data Lake Is A data lake is a large, low-cost store that holds raw data of many kinds — spreadsheets, logs, images, sensor readings — in its original form, ready to be processed later. It complements a structured warehouse rather t...
Indexes: Why Some Queries Are Fast and Others Slow
26/05/2025 17:25:03

Indexes: Why Some Queries Are Fast and Others Slow If your application suddenly feels sluggish when listing records or searching, the cause is often a missing index. An index is to a database what the index at the back of a book is to a r...
Slow Query Logs and Performance Tuning
14/06/2025 10:51:39

Slow Query Logs and Performance Tuning When a database feels slow, guessing at the cause wastes time. Databases can record exactly which queries are taking too long, and that slow query log is the starting point for nearly all performance...
Data Types and Choosing the Right One
19/06/2025 12:47:57

Data Types and Choosing the Right One Every column in a database has a type that tells it what kind of value to expect — text, a whole number, a decimal, a date and so on. Choosing well keeps data accurate, compact and fast to query. ...
Migrating Data Between Two Systems
30/06/2025 11:00:02

Migrating Data Between Two Systems Moving from an old system to a new one is one of the riskiest parts of any project, because the data is your business. A careful migration plan turns a nerve-wracking switch into a controlled, rehearsed ...
Archiving Old Data to Keep Things Fast
28/07/2025 14:03:34

Archiving Old Data to Keep Things Fast Databases generally slow down as they grow, and much of that growth is old data nobody queries day to day — orders from years ago, expired sessions, historical logs. Archiving moves that data aside s...
Encrypting Data at Rest and in Transit
14/08/2025 18:30:03

Encrypting Data at Rest and in Transit Encryption scrambles your data so that, even if it is intercepted or stolen, it is unreadable without the key. For a database, this applies in two places: while data travels across the network, and w...
Caching Query Results
18/08/2025 11:44:22

Caching Query Results If the same expensive question is asked of your database over and over, answering it fresh every time is wasteful. Caching stores the answer for a short while so repeated requests are served instantly. Used we...
Deduplicating Messy Data
15/10/2025 15:19:34

Deduplicating Messy Data Duplicate records creep into almost every system — the same customer entered twice, a contact imported from two lists, slight spelling differences treated as separate people. Left unchecked, duplicates inflate you...
JSON Columns: Flexible Data in SQL
13/12/2025 08:53:55

JSON Columns: Flexible Data in SQL Sometimes part of your data does not fit neat columns — a set of preferences, varying product attributes, or a third party's response. Modern relational databases let you store such data as JSON inside a...
Choosing Between MySQL, PostgreSQL and SQL Server
30/12/2025 19:47:29

Choosing Between MySQL, PostgreSQL and SQL Server When a relational database is the right call, the next question is which one. MySQL, PostgreSQL and Microsoft SQL Server are all capable, mature choices, and the best fit depends on your f...
Soft Deletes and Data Retention
08/01/2026 19:23:56

Soft Deletes and Data Retention When a user clicks 'delete', should the record vanish forever? Often the safer answer is no. A soft delete marks a record as removed without actually destroying it, which protects you from mistakes and supp...
Data Validation at the Database Level
12/02/2026 15:16:34

Data Validation at the Database Level It is tempting to rely on your application to check data before it is saved, but applications change, have bugs and are not the only thing that ever writes to a database. Adding validation rules in th...
Audit Logging Changes to Records
18/02/2026 18:02:51

Audit Logging Changes to Records For many businesses it is not enough to know what the data says now — you need to know who changed it, when, and what it was before. Audit logging records that history, supporting accountability, dispute r...
Database Security and Least Privilege
14/03/2026 11:35:37

Database Security and Least Privilege Your database often holds your most valuable and sensitive information, which makes it a prime target. Sound database security is layered, but one principle underpins it all: give every account only t...
Backups, Restores and Point-in-Time Recovery
25/03/2026 17:23:53

Backups, Restores and Point-in-Time Recovery A backup is only as good as your ability to restore from it. For a business, the real question is not 'do we have backups?' but 'how much data could we lose, and how quickly could we be back on...
Data Warehouses vs Operational Databases
12/04/2026 14:41:34

Data Warehouses vs Operational Databases The database that runs your day-to-day application and the one you use to analyse trends are usually best kept separate. They are optimised for opposite jobs, and forcing one to do both leads to sl...
Data Pipelines and Scheduling
20/04/2026 09:04:45

Data Pipelines and Scheduling A data pipeline is an automated sequence of steps that moves and transforms data from where it is created to where it is needed — for example, from your live app into nightly reports. Reliable scheduling and ...
Master Data Management Basics
28/04/2026 17:48:19

Master Data Management Basics As businesses grow, the same core information — customers, products, suppliers — ends up scattered across several systems, each with its own slightly different version. Master data management (MDM) is the dis...
Reporting Databases and Why We Separate Them
06/05/2026 18:24:22

Reporting Databases and Why We Separate Them When the team wants dashboards and exports, running those queries against your live database can slow the very service your customers rely on. A reporting database solves this by giving analyst...
Anonymising Data for Testing
10/05/2026 10:31:09

Anonymising Data for Testing Developers need realistic data to test against, but using real customer records in a test environment is risky and, under data protection law, often unlawful. Anonymising data lets your team work with realisti...
GDPR and the Right to Erasure in Databases
28/05/2026 09:37:13

GDPR and the Right to Erasure in Databases Under UK GDPR, individuals can ask you to delete the personal data you hold about them — the 'right to erasure' or 'right to be forgotten'. Honouring that request properly is more involved than d...