Skip to content

Data Scientist Hub

Data Engineering

a-slice-of-py/dsh

Data Engineering¶

Data Architecture¶

Basics¶

Data Engineering Vault¶

Data catalog¶

lakekeeper: lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust

Data Structures¶

Tensors vs Tables

Database¶

DuckDB¶

ACID¶

deltabase: a lightweight, comprehensive solution for managing delta tables built on polars and deltalake
strava-datastack: a modern Strava data pipeline fueled by dlt, duckdb, dbt, and evidence.dev

Apache Iceberg¶

Apache DataFusion¶

Apache DataFusion in Python

Monitoring¶

SLA, SLO and SLI for data teams

OS¶

Tech info about operating systems

Rest API¶

Tools¶

Search engines¶

Approaching Relevance Challenges in Elasticsearch Query Construction

Unit testing¶

Test driven development and triangulation