(my) Hacker News #12¶
Here's a list of the latest resources that grabbed my attention.
Anomaly detection¶
Articles¶
- AI makes you boring
- Good sleep, good learning, good life
- Grief and the AI split
- How many products does Microsoft have named Copilot?
- How the spreadsheet reshaped America
- The Eternal Sloptember
- Thoughts (hmmz.org)
- Thoughts on slowing the fuck down
- Webmentions with batteries included
- ai;dr
Audio recognition¶
- NumPy as Synth Engine
- latent-musicvis: Music visualization via UMAP of stable audio latents
- pytheory: Music Theory for Humans
- renardo: Livecoding music with simple Python code
AutoML¶
Career development¶
- NASA Elements of Engineering Excellence
- Nobody gets promoted for simplicity
- Quanto guadagna un Data Scientist? Datapizza Salaries
- The Revenge of the Data Scientist
- What actually makes you senior
Clustering¶
- EVoC: Embedding Vector Oriented Clustering
- GraphHDBSCAN: CoreSG and GraphHDBSCAN implementation
- K-Means Clustering illustrated artwork (Allison Horst)
Coding¶
- polyglot: Rust/Wasm-powered SQL transpiler for 30+ SQL dialects
- pytm: A Pythonic framework for threat modeling
Community detection¶
Computer Vision¶
- Floodfill algorithm in Python
- LitePali: Lightweight ColPali-based document retrieval
- PixelRAG: The end of web parsing, the beginning of scalable pixel-native search
- rembg: Tool to remove images background
Courses¶
Data Journalism¶
- Storied Colors: One color a day, told with its provenance and chemistry
- Who earns a higher salary than you and the jobs they work
Data Structures¶
- Hidden cost of Python dictionaries and safer alternatives
- Your file system is already a graph database
Data validation¶
Database¶
- The DuckDB Client-Server Protocol
- duck_lineage: DuckDB extension that captures lineage events for executed queries
Dataset¶
Dependencies management¶
Dimensionality Reduction¶
- PyMDE: Minimum-distortion embedding with PyTorch
- UMAP Zoo: 3D Wavefront files projected into 2D with UMAP and THREE.js
- datatour_pkg: Watch your data in its native dimension
- dtour: A tour-de-vis through high-dimensional data
Documentation¶
great-docs: Documentation sites for Python packages- bengal: High-performance static site generator for Python 3.14+
Embeddings¶
- E-MM1: Multimodal embedding model collection
- FUEL: Fast Unsupervised Embedding Learning
- comparative-embedding-visualization: Jupyter widget for comparing two embeddings
- turbovec: A vector index built on TurboQuant, written in Rust with Python bindings
- zvec: A lightweight, lightning-fast, in-process vector database
Functional programming¶
GUI¶
- PyWry: Cross-platform app factory and rendering engine for Python
- imgui_bundle: Interactive Python and C++ apps for desktop, mobile, and web
Game development¶
Geodata¶
- Overture has fully embraced STAC
- Urban Taxonomy: Hierarchical morphotope classification
- gazetteer: A fast offline, boundary-aware reverse geocoding library in Python
- geojson: Python bindings and utilities for GeoJSON
- mappymatch: Pure-python package for map matching
Git and versioning¶
- Scoped Commits
- The Git commands I run before reading any code
- prek: A fast Git hook manager written in Rust, drop-in alternative to pre-commit
Github¶
High-dimensional data¶
IDE¶
Interactive visualizations¶
Jupyter¶
- TinyMo
- marimo for learners
- marimo-jupyter-extension: Integrate marimo reactive notebooks into JupyterLab
- wanderland: Interactive 3D learn-to-code playground for Python notebooks
Knowledge Management¶
Large Language Models (LLM)¶
- Andrej Karpathy's LLM Wiki: Create your own knowledge base
- Components of a coding agent
- Emotion concepts and their function in a LLM (full paper PDF)
- Emotion concepts and their function in a large language model (Transformer Circuits)
- Emotion concepts and their function in a large language model
- EngGPT2-16B-A3B: Sovereign, efficient and open Italian-language LLM
- How LLMs Work
- How to use Ollama to run Large Language Models locally
- I replaced vector DBs with Google's Memory Agent Pattern for my notes in Obsidian
- LLMs as writers (Oxide RFD 0576)
- Library Skills
- LiteRT-LM: High-performance inference framework for LLMs on edge devices
- MicroGPT explained interactively
- Natural Language Autoencoders (Anthropic)
- ProtoGensis: Memory Agent Bedrock
- Requirements analysis: Catching requirement bugs before they become code (Kiro)
- Running local models is good now
- So you wanna build a local RAG?
- Stop Sloppy Pasta: Don't paste raw LLM output at people
- Structured-Prompt-Driven Development (Martin Fowler)
- The Illustrated Transformer
- The Repo Is the Harness
- They're Made Out of Weights
- Transformer Explainer: LLM Transformer model visually explained
- Transformers from scratch (Brandon Rohrer)
- Vibe coding and agentic engineering are getting closer than I'd like
- Voxtral-4B-TTS-2603: Frontier open-weights text-to-speech model
- We solved trust for AI Agents in 1973
- Writing skills that agents can actually execute
- adaptive-chunking: Automatically select the best chunking method per document for RAG
- agent-skills: Production-grade engineering skills for AI coding agents
- caveman: Claude Code skill that cuts 65% of tokens by talking like caveman
- chonkie: Lightweight ingestion library for fast and robust RAG pipelines
- dictionary-of-ai-coding: AI coding jargon explained in plain English
- kirograph: Semantic code knowledge graph for Kiro
- kokoro-tts: CLI text-to-speech tool using the Kokoro model
- kotaemon: Open-source RAG-based tool for chatting with your documents
- llm-wiki (Karpathy gist)
- marimo-pair: Skills for AI coding agents with marimo
- microgpt.py (Karpathy's minimal GPT gist)
- microgpt: A single file 200-line pure Python GPT
- ponytail: Makes your AI agent think like the laziest senior dev in the room
- probabl-ai/skills: Data Science skills for AI agents
- querychat: Natural language exploration of tabular data powered by SQL and LLMs
- semble: Fast and accurate code search for agents
- sideseat: Unified workbench for building and debugging AI Agents
- skills: Skills for Real Engineers (Matt Pocock)
- spec-kit: Toolkit to get started with Spec-Driven Development
- strands-agents/shell: Give your agent a shell without giving it the keys to your machine
- superpowers: An agentic skills framework and software development methodology
- syllago: Content management system for AI coding tools
Machine Learning¶
- Demystifying table foundation models
- Scikit-learn Central: Packages and use cases
- timber: Ollama for classical ML, AOT compiler for XGBoost, LightGBM, scikit-learn models
Markdown¶
- Rill: a BI-as-code tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL
- SDocs: Read, style, and share markdown files privately
- files.md: Private, quiet space for thinking with .md files
- mkslides: Turn markdown files into beautiful slides using Reveal.js
- quarkdown: Markdown with superpowers
Mathematics¶
- An interactive guide to the Fourier transform
- passagemath: General purpose mathematical software, modularized fork of SageMath
Misc¶
- Hacker News Trends: Search and chart any topic over time
- msgvault: Archive a lifetime of email and chat with offline search and analytics
- spicytakes.org: Multi-blog archive and analysis platform
Misc utils¶
- calgebra: Set operations for calendar intervals
- param: Declarative parameters for robust Python classes and reactive programming
Natural Language Processing (NLP)¶
- Apache Solr: blazing-fast, open source, multi-modal search platform built on the full-text, vector, and geospatial search capabilities of Apache Lucene
- Fast and easy Levenshtein distance using a Trie
- GLiNER2: Unified schema-based information extraction and text classification
- Tagging my blog posts with BERTopic and LLMs
- Text classification with Python 3.14's zstd module
- Topic modeling made just simple enough
- mall: Use Large Language Models to run NLP operations against your data
Networks and graphs¶
Optimization¶
- CPMpy: Constraint Programming and Modeling library in Python
- MIP formulations and linearizations (FICO Xpress)
Pandas¶
Privacy¶
- OpenAI Privacy Filter: Bidirectional token-classification model for PII detection
- Privacy for class attributes and methods
Project packaging¶
- How to build a Python library in 2026
- projspec: A project about projects
- pyOpenSci Python Package Guide
- pypi-security-best-practices: PyPI security best practices for uv and pip
- pypistats.org: PyPI downloads analytics dashboard
Python¶
SQL¶
Services¶
- AWS Lambda introduces MicroVMs: Run isolated sandboxes with full lifecycle control
- Building Production-Ready AI Agents with Amazon Bedrock AgentCore
- Learn AWS IAM
- floci: Light, fluffy, and always free AWS local emulator
- ministack: Free, open-source local AWS emulator with 55+ services
- nx-neptune: Graph analytics for your data lake, powered by Amazon Neptune Analytics
- sample-well-architected-skills-and-steering: Skills that teach AI agents the AWS Well-Architected Framework
Similarity measures¶
Software Development¶
- A software library with no code
- A sufficiently detailed spec is code
- Browse code by meaning
- How do large companies manage CI/CD at scale
- Laws of Software Engineering
- The Economics of Software Teams
- TigerStyle: Safety, performance, experience
- You're not building Netflix: Stop coding like you are
- semantic-navigator: Semantic project navigation
Streamlit¶
- Streamlit Extras: Discover, try, install and share Streamlit reusable bits
- Streamlit widget to host folder as website
- streamlit-pivot-table: Pivot table component for Streamlit
Structural Pattern Matching¶
Teaching¶
Technical writing¶
Terminal¶
- browsr: A pleasant file explorer in your terminal supporting all filesystems
- zoxide: A smarter cd command
Testing¶
- Better Python tests with inline-snapshot
- Python Big-O: Time and space complexity
- Unit testing your code's performance: Big-O scaling
- bigO: Measures empirical computational complexity of functions
- profiling-explorer: Table-based exploration tool for Python profiling data
Time Series¶
- AutoML Time Series Forecast (FLAML)
- Six approaches to time series smoothing
- Skforecast Studio
- Time Series Smoothing (Streamlit app)
- conformal-tights: Conformal prediction of coherent quantiles and intervals for scikit-learn
- pytrendy: Trend detection in Python for time series
- scipy.signal.savgol_filter (SciPy documentation)
Tools¶
- Datakit: Browser-based data analysis platform that processes multi-gigabyte files locally
- quak: A scalable data profiler
Tools¶
- Open Visualization Academy
- inkwash: Pen-and-ink with living water, single-file fluid ink drawing app
- kuva: A scientific plotting library in Rust
Tools¶
Tree-based methods¶
Typing¶
- Are you really expected to run five type-checkers now?
- How Well Do New Python Type Checkers Conform? A Deep Dive into Ty, Pyrefly, and Zuban
- ty: An extremely fast Python type checker and language server, written in Rust