(my) Hacker News #10¶
Here's a list of the latest resources that grabbed my attention.
Anomaly detection¶
- Anomaly Detection In Time Series
- Anomaly Detection with Nixtla
- DTAI Anomaly Detection
- Darts Anomaly Detection
- Merlion Anomaly Detection Intro
- Simple Anomaly Detection In Time Series Via Optimal Baseline Subtraction
- Unsupervised Anomaly Detection for Web Traffic Data
- Unsupervised Anomaly Detection
Articles¶
Basics¶
Books¶
- Data Visualization with Category Theory and Geometry
- Sequential Decision Analytics and Modelling with Python
CLI¶
- python-inquirer: Collection of common interactive command line interfaces
- ty: Modern Python CLI framework with type hints
Coding¶
Data Structures¶
Dataset¶
- ArXiv Explorer
- Influcast: predizioni epidemiologiche
- Python Data Commons: Unlock deeper insights with the new Python client library for Data Commons
Deep Learning¶
Diagrams¶
Documentation¶
- Introducing New Open Source Documentation Resources
- mkapi: MkDocs plugin for automatic API documentation generation from Python docstrings
- mkdocs MCP server
Embeddings¶
- A Visual Exploration Of Vector Embeddings
- Illustrated Word2Vec
- LLM Embeddings Explained: A Visual and Intuitive Guide
- ML Embeddings Overview
- OpenAI: What Are Embeddings?
- s3vectors-embed-cli :A CLI facilitating semantic similarity search on media in Amazon S3 via AWS Bedrock and Amazon S3 Vectors
- vectorvfs: Your filesystem as a vector database
Functional programming¶
- Functional programming in DS projects
- The Missing Manual For Signals State Management For Python Developers
- Trio: Python library for writing asynchronous applications
- flowshow: Just a super thin wrapper for Python tasks that form a flow
- reaktiv: Signals for Python
- tinyio: a tiny event loop for Python
- toolz: Functional programming utilities for Python
Geo science¶
- Maps With Django, GeoDjango Pillow And GPS
- OpenStreetMap API Doc
- Sea Surface Temperature Daily Analysis
- UrbanMapper: Enrich Urban Layers Given Urban Datasets
- geoai: GeoAI: Artificial Intelligence for Geospatial Data
- jupytergis: Collaborative GIS editor in Jupyter
Git and versioning¶
Graphs¶
IDE¶
Jupyter¶
- Everything As Python: From notebook to prod with Bauplan and marimo
- JupyterCAD: 3D CAD in Jupyter notebooks
- ipyleaflet: Interactive maps in Jupyter notebooks
Knowledge Management¶
- Enabling Hugo Static Site Search With Lunr.js
- How to create Architectural Decision Records (ADRs) - and how not to
- I Deleted My Second Brain
Large Language Models (LLM)¶
- AI Is A Floor Raiser Not A Ceiling Raiser
- Awesome Amazon Q developer
- Basic Memory: AI conversations that actually remember
- Design Partner
- Google A2A: Agent2Agent Protocol
- I'd rather read the prompt
- In Praise Of Normal Engineers
- LLMs to Alloy
- MCP As An Accidentally Universal Plugin
- Teach your LLM about me
- The Prompt Engineering Playbook For Programmers
- ask-human MCP
- elroy: An AI assistant that remembers and sets goals
- fastmcp: The fast, Pythonic way to build MCP servers and clients
- langextract: A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization
- sourcebot: a self-hosted tool that helps you understand your codebase
Learning Path¶
Logging¶
Machine Learning¶
- Machine Learning Prototyping with DuckDB and scikit-learn
- Machine Learning Q And AI
- RapidFuzz: Rapid fuzzy string matching library
- Tour Of PyGAM: Generalized Additive Models
- distfit: Probability density function fitting and hypothesis testing
- fastcore: Python supercharged for the fastai library
- imodels: Interpretable machine learning models
- nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs
- pyGAM: Generalized Additive Models in Python
- scikit-survival: Survival analysis in Python
- tea-tasting: Statistical testing for A/B experiments
Markdown¶
- Koaning's draft: GitHub-style markdown editor with AI assistance
- MDX: Markdown for the component era
Mathematics¶
- Discrete Mathematics
- Planes in 3D space
- Systematizing Byrne's The Elements of Euclid
- The Map of Mathematics
Methodology¶
Misc¶
- Bees BeWeather
- Datova Word Calculator
- Itter: social media for purists
- Just: Settings for Nushell
- Microsoft edit
Misc utils¶
- A Python Dict That Can Report Which Keys You Did Not Use
- AutStr: Infinite data structures in python
- Dumpy: Numpy except it's ok if you're dumb
- Otary: elegant, readable, and powerful image and 2D geometry Python library
- Pake: Turn any webpage into a desktop app with Rust
- Perfect Freehand
- Python memory graph
- Scrapling: high-performance Python library for Web Scraping
- Tamga: A modern, high-performance logging utility for Python
- Temporary files and directories in Python
- clippy with some AI
- copier: Library and CLI app for rendering project templates
- html-to-markdown
- notata: A lightweight Python library for saving simulation results in a standardized, reproducible format
- orbital: Turn SciKitLearn pipelines into SQL
- overtype: The markdown editor that's just a textarea
- python-ulid: ULID implementation for Python
- quarkdown: Markdown with superpowers
- sff: CLI for semantic search on your computer
- wetlands: Conda Environment Manager, a library to execute code in isolated environments
Natural Language Processing (NLP)¶
Neural Networks¶
OO Programming¶
Pandas¶
- Data Validation Libraries for Polars
- Do More With Numpy Array Type Hints
- Pandas Crosstab
- Pandas DataFrame Plot Density
- patito: A data modelling layer built on top of polars and pydantic
Physics¶
Probability & Statistics¶
- Bollinger Bands
- Dummy's Guide to Modern LLM Sampling
- NannyML Probability Calibration
- ProbPy: Python Probabilistic Calculus
- Probability Calibration Curve
- Probability Calibration
- Think Correlation Isn't Causation? Meet Partial Correlation
Project packaging¶
Software Development¶
- The Guide to Hashing I Wish I Had When I Started
- The Ingredients of a Productive Monorepo
- toto: DataDog's framework for securing software supply chain
Terminal¶
- ghostty: Fast, feature-rich, and cross-platform terminal emulator
- helix: A post-modern modal text editor
Testing¶
Time Series¶
- 100 Time Series Data Mining Questions
- Digital Signals Theory
- How Can We Quantify Similarity Between Time Series
- Interpreting ACF And PACF Plots For Time Series Forecasting
- Matrix Profile Tutorial
- Time Series Similarity
- stumpy: Modern time series analysis library
- tslearn: Machine learning toolkit for time series analysis
Tools¶
- Deep Dive Into Duckdb Data Scientists
- DuckDB ducklake
- OpenLineage Marquez
- Semantic Layer DuckDB Tutorial
- boring-semantic-layer: a lightweight semantic layer based on Ibis
- kuzu: Embedded graph database built for query speed and scalability
- manticoresearch: Easy to use open source fast database for search
- octanedb: A high-performance, lightweight vector database library built in Python
- prql: Pipelined Relational Query Language
- robinzhon: Minimal, high-performance Python helpers for concurrent S3 object transfers
- s3grep: CLI tool for searching logs and unstructured content in Amazon S3 buckets
- soda-core: Data quality testing framework
Tools¶
- Topic Tomographies
- Visprex: Visualise your CSV files in seconds without sending your data anywhere
- Vistorian: Interactive Visualizations for Dynamic and Multivariate Networks
- Visualizing 100k Years Of Earth In WebGL
- glyphx: A next-gen Python plotting library
- hvplot: High-level data visualization built on HoloViews
- ipecharts: Echarts Jupyter Widget
- pictex: A powerful Python library for creating complex visual compositions and beautifully styled images
- pyecharts: Apache Echarts in Python
- sentiment-analysis-viz: Real-time visualization of sentiment analysis on text input
Typing¶
Utils¶
- A Mini Book On AWS Networking
- AWS Strands Agents Python SDK
- AWS Strands Agents
- iam-floyd: AWS IAM policy statement generator with fluent interface
- iam-lens: visibility into the IAM permissions in your AWS organizations and accounts