A comparison between AWS databases¶
Main databases types:
- Relational: data are stored in tabular form (rows and columns), where each row represents a unique record. Tables can be put in relation with each other through joins and queried via SQL;
- Key-value: non-relational database where each record stored as a unique key with its associated value, resembling a dictionary-like structure;
- Document: semi-structured and hierarchical databases for catalogs and content management systems, often stored as JSON;
- Graph: the way the data are stored is graph-based, with nodes and edges connecting each data source with the others;
- Time-series: database optimized for records which indices are timestamps.
| Service | Type | Query language | Use cases |
|---|---|---|---|
| Athena | Structured, semi-structured and unstructured | SQL based on HiveQL DDL and Presto DML | Log analysis, OLAP, BI |
| Aurora1 | Relational | MySQL, PostgreSQL | eCommerce, CRM |
| DocumentDB | Document | compatible with MongoDB query language | product catalogs, images and videos, application data platform |
| DynamoDB | Key-value | NoSQL | Mobile and web apps, gaming, IoT |
| Neptune | Graph | GQL (Apache TinkerPop Gremlin, SPARQL) | Fraud detection, social netowkrs, knowledge graph, recommendation engines |
| Timestream | Time-series | ANSI SQL | IoT, DevOps, telemetry, forecasting, analytics |
| RDS | Relational | SQL | DWH, CRM |
| Redshift | Relational | SQL based on PostgreSQL | Large-scale DWH, data migration, OLAP |
Warning
Athena and RDS are somewhat erroneously reported in the above table which refers to databases: a deep dive on the main differences is available below.
Amazon RDS¶
From the official docs:
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud. It provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks.
Amazon RDS is not therefore a database engine itself, rather than a tool that helps in managing relational databases on AWS. Essentially, can be thought as an AWS-managed, generally purpose RDBMS.
It allows to manage and run six database engines:
- Amazon Aurora
- MySQL
- MariaDB
- PostgreSQL
- Oracle
- Microsoft SQL Server
A brief comparison between open source databases¶
A brief comparison between open source relational databases is available below (full credits to Soufiane L).
| Feature | MariaDB2 | MySQL | PostgreSQL3 |
|---|---|---|---|
| Materialized Views | ❌ | ❌ | ✔️ |
| Partial Indexes | ❌ | ❌ | ✔️ |
| Array Data Type | ❌ | ❌ | ✔️ |
| JSON Data Type | ✔️ | ✔️ | ✔️ |
| CHECK constraints | ✔️ | ❌ | ✔️ |
| Replication | ✔️ | ✔️ | ✔️ |
| Full-Text Search | ✔️ | ✔️ | ✔️ |
| UPSERT | ✔️ | ✔️ | ✔️ |
| Common Table Expressions | ✔️ | ✔️ | ✔️ |
| Sequences | ✔️ | ❌ | ✔️ |
| Table Partitioning | ✔️ | ✔️ | ❌ |
Amazon Athena¶
Amazon Athena it's not a database engine itself but it's defined as an interactive query service based on Apache Presto that makes easy to analyze data stored into S3 via SQL. This means that compute and storage are separate and managed independently.
It is completely serverless and cost-effective, and can be used together with AWS Glue - fully managed ETL service - which takes care of managing a related data catalog as a central source for metadata while preparing data for querying.
In contrast, efficient querying requires data to be partitioned and purposely organized into S3 buckets upfront; moreover AWS Athena users compete for the same resources at the same time. While AWS provisions more resources as needed, it could mean that query performance fluctuates based on other users needs.