2024-08-29¶
A/B Testing¶
ACID¶
- 4 hours learning Apache Iceberg
- AWS Apache Iceberg technical guide
- Apache DataFusion in Python
- Apache Iceberg O'Reilly Training
- Apache Polaris: the interoperable, open source catalog for Apache Iceberg
Books¶
CLI¶
Career development¶
Code maintenance¶
Colors¶
Computer Vision¶
Dash¶
Data Augmentation¶
Database¶
- DuckDB Doesn’t Need Data To Be a Database
- DuckDB Tricks
- DuckDB and Motherduck serverless analytics platform
- DuckDB blog: Friendly Lists and Their Buddies, the Lambdas
- DuckDB: open source OLAP database
- Friendly SQL in DuckDB
- Graph components with DuckDB
- QuackOSM: an open-source Python and CLI tool for reading OpenStreetMap PBF files using DuckDB
- mosaic: an extensible framework for linking databases and interactive views
- sqlite-vec: a vector search SQLite extension that runs anywhere!
Dataset¶
Documentation¶
Embeddings¶
GUI¶
Geo science¶
- Exploring Overture data, no SQL required
- Humanitarian OpenStreetMap Team
- Overpass Turbo
- Overture GERS: Towards Standardizing Place
- Overture Grabber
- Overture Maps data
- Overture Maps docs
Geodata¶
Git and versioning¶
High-dimensional data¶
Jupyter¶
Large Language Models (LLM)¶
- A Visual Guide to Quantization
- A programming framework for agentic AI
- AI models collapse when trained on recursively generated data
- Auditing the Ask Astro LLM Q&A app
- Developers with AI assistants need to follow the pair programming model
- Explaining generative language models to (almost) anyone
- Gemma 2 optimized for your local machine
- GraphRAG: a modular graph-based Retrieval-Augmented Generation (RAG) system
- How to build a basic LLM GPT model from Scratch in Python
- LLM Evaluation
- LLM sampling
- LangDrive: train LLMs on private data
- LitGPT: 2high-performance LLMs with recipes to pretrain, finetune and deploy at scale
- Open WebUI: user-friendly WebUI for LLMs
- The Rise of the LLM OS: From AIOS to MemGPT and beyond
- Trace: AutoDiff for AI Systems and LLM Agents
- Unsloth: Finetune Llama 3.1, Mistral, Phi and Gemma
- Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data
- giskard: Open-Source Evaluation & Testing for LLMs and ML models
- guidance: a guidance language for controlling large language models
- llmware: unified framework for building enterprise RAG pipelines with small, specialized models
- talkd/dialog: RAG LLM Ops App for easy deployment and testing
Machine Learning¶
- LitServe: an easy-to-use, flexible serving engine for AI models built on FastAPI
- Why You Should Never Use Cross-Validation
Mathematics¶
Methodology¶
Misc utils¶
- bigtree: Tree Implementation and Methods for Python, integrated with list, dictionary, pandas and polars DataFrame
- pycountry: a Python library to access ISO country, subdivision, language, currency and script definitions and their translations
OCR¶
Pandas¶
- How Narwhals and scikit-lego came together to achieve dataframe-agnosticism
- Narwhals: lightweight and extensible compatibility layer between dataframe libraries!
- ibis introduction by calmcode
Performance monitoring¶
Probability & Statistics¶
Project Management¶
Project packaging¶
Ridgeline plots¶
Scikit-learn¶
Software Development¶
- All Code Is Technical Debt
- The Missing Semester of Your CS Education
- The Real Problem with Software Development: It's not writing code, it's managing complexity
- The cloudy layers of modern-day programming
- Understanding Polylith through the lens of Hexagonal architecture
Streamlit¶
- Build a chatbot with custom data sources, powered by LlamaIndex
- Link analysis
- Navigation bar
- Streamlit Component to quickly create Interactive Flow Diagrams using React Flow
- Streamlit auth via JWT and FastAPI
- Streamlit: an opinionated framework
Testing¶
Tools¶
Weather data¶
- Aurora: a Foundation Model of the Atmosphere
- NeuralGCM: a Python library for building hybrid ML/physics atmospheric models for weather and climate simulation
Web App Framework¶
- DearPyGui: a fast and powerful Graphical User Interface Toolkit for Python with minimal dependencies
- Hyperdiv: Build reactive web UIs in Python