Data Science¶
A/B Testing¶
- Art of A/B testing
- When and when not to A/B test
- Fundamentals of A/B testing
- Netflix: interpreting A/B testing results part 1
- Netflix: interpreting A/B testing results part 2
- How to Accurately Test Significance with Difference in Difference Models
- Forget Statistical Tests: A/B Testing Is All About Simulations
Anomaly detection¶
- Time series anomaly detection
- Isolation forest
- Anomaly detection with
pycaret
- Equipment effectiveness with python
- Real-time time series anomaly detection
- Anomaly detection with copulas
- PyOD: Python outlier detection library
- 5 anomaly detection algorithms with sklearn implementation
- Auto anomaly detection with isolation forest
- Probabilistic forecasting of binary events using regression
- Luminol: a light weight Python library for time series anomaly detection and correlation
- Alibi Detect: an open source Python library focused on outlier, adversarial and drift detection
AutoML¶
Bayesian methods¶
- Bayesian revenue estimation
- Estimating probabilities with bayesian modeling
- Probabilistic programming and Bayesian methods for hackers
- Bayesian neural network with
pyro
andpytorch
- Bayesian modeling with
pymc3
- Richard Gott Princeton
- How to be less wrong
- Bayesian logistic regression with
pymc3
- Naive Bayes
- Math behind Bayes methods
- Mathematical explanation of Naive Bayes
- Bayesian inference with
pymc3
- Bayesian Time Series linear regression
- Bayesian optimization
- Bayesian inference
- Gelman book Columbia university
- Bayesian state space model with
pymc3
- Bayesian inference and MCMC variational inference
bambi
for Bayesian Model Building Interface- How to be less wrong: a Bayesian's guide to predicting the future with limited data
- bnlearn: a library for Bayesian network learning and inference
- Kalmangrad: automated, smooth, n-th order derivatives of non-uniformly sampled time series data
Bin Packing¶
- BPP scipbook
- PuLP
- GitHub PuLP
- PuLP examples
- Wedding PuLP example
- PuLP paper
- Binpacking library
- Binpacking PuLP
Binsize¶
- Histogram
- Model risk
- Optimal number of bins Freedman Diaconis rule
- Understanding Freedman Diaconis rule
- Freedman Diaconis rule paper
Causal Inference¶
- Causal Inference handbook
- The Effect: An Introduction to Research Design and Causality
- Intro to Causal Inference course
- What is Causal Inference?
- PyWhy: ML based causal inference by Microsoft DoWhy
- CausalPy hands-on
Churn analysis¶
- Modeling customer churn when churns are not observed
- EDA empirical cumulative distribution
- Customer churn management
- Predicting customer churn for telcom
- Churn analysis via PyCaret
Classification¶
- Reliability diagrams for probability calibration
- Composite classification metrics
- SMOTE for synthetic data augmentation and unbalanced datasets
- Calculating business value of binary classification
predict_proba
probabilities calibration- Why class balancing can be avoided
- Matthews Correlation Coefficient (MCC) and Brier score
- Precision and Recall visually explained
- The Effect of Class Imbalance on Precision-Recall Curves
- Classification metrics calibration
binclass-tools
for binary classification inspection
Clustering¶
- Tomato clustering
- Alternatives to k-means
- Clustering metrics
- Find optimal k in KNN
- K-Means
- Anatomy of K-Means
- Clustering algorithms comparison
- Intro to hierarchical clustering
- How to determine optimal clusters number
- Clustering algorithms comparison
- Hierarchical Agglomerative Clustering
- Hierarchical clustering 101
- Regional Online Learnable Fields (ROLF)
- Interpretable KMeans via Classification feature importance
- How to select optimal k for K-Means
- How many clusters? Methods comparison
- Expectation Maximization soft clustering
- Markov clustering
- Unsupervised Learning Series: exploring DBScan
- Recursive Embedding and Clustering by Spotify
Code differentiation¶
Community detection¶
Computer Vision¶
- Lane detection
- Modern computer vision with
caer
scikit-image
for image processing- FiftyOne app
- Augmentor for image augmentation
- Concept: a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images
- Understanding Diffusion Models: A Unified Perspective
Correlation¶
Curse Of Dimensionality¶
Customer value¶
- Customer segmentation
- Seasonal customers via time series analysis
- Identify seasonal customers with Python
- Customer lifetime value prediction
- Business DS
- Quantiles from ML model
- DS guide to subscription businesses
- Clustering for customer segmentation
Dataset¶
- Dataset search
- Hand drawn data
- Faker
- Mimesis
- SDV
- Diffbot
- Datasette
- Hand drawn data in Jupyter with
drawdata
- World Bank data API
- Footprint Network
- Sondaggi politico elettorali ITA
- Microsoft's Bing road detections
- Kontur population dataset
- ISPRA: Open Data sul dissesto idrogeologico
- Folktables
- The official portal for European data
- Awesome datasets
- Dataset imports from UCI ML Repository
- Eurostat Data
- MusicBrainz API
- Open-Meteo: Free Weather Forecast API for non-commercial use
- Data Commons: aggregates global, open data, uncovering insights with natural language questions
- Foursquare Places OS Data Schemas
Deep Learning¶
Dimensionality Reduction¶
Dynamic pricing¶
- Dynamic pricing for theatre
- Regression for price optimization
- Dynamic Pricing with Reinforcement Learning from Scratch: Q-Learning
Embeddings¶
- Embetter: scikit-learn compatible embeddings for computer vision and text
- An intuitive introduction to text embeddings
- The Hidden World of (Vector) Indexes
- Why cosine similarity between sentence embeddings is always positive
- What are embeddings?
Energy and Power Systems¶
- PyPSA: Python for Power System Analysis
- Prebuilt Electricity Network for PyPSA-Eur based on OpenStreetMap Data
- Modelling the High-Voltage Grid Using Open Data for Europe and Beyond
Ensemble models¶
Ethics¶
Features engineering and selection¶
- Boruta
- Guide to feature extraction
- Feature selection don'ts
- Features normally distributed
- How and Why
- Feast: an open source feature store for machine learning
- Shapicant: a feature selection package based on SHAP and target permutation, for pandas and Spark
Football analytics¶
- Paper
- Predict Euro 2020 winner
- Poisson regression for football match results prediction
- Predicting FIFA World Cup 2022 winner
Function learning¶
Game Theory¶
Gaussian Mixture Models¶
Gaussian Processes¶
Genetic algorithm¶
Geo science¶
- Transportation DS
- Spatial autocorrelation
- Geospatial data declustering
- EDA of spatial data and spatial autocorrelation
- GPS trajectory clustering
- Geospatial indexing with quadkeys
- Travel time estimatione using quadkeys
- Geographic Data Science with Python
- Geocoding via Geoapify
- Geospatial Data Engineering: Spatial Indexing
- Proximity Analysis: a few words about spatial data processing
- Deep Dive into ESA's Sentinel API
- Geospatial Analysis and Representation for Data Science course for the master in Data Science University of Trento
- 3D Geospatial Data Integration with Python: The Ultimate Guide
srai
: Spatial Representations for Artificial Intelligence- Earth Isn't Flat, and Neither Should Your Voronoi Diagrams Be
- Voronoi diagram in Manhattan metric
- Geospatial Indexing Explained: A Comparison of Geohash, S2, and H3
OpenStreetMap¶
Overture Maps¶
- Overture Maps data
- Overture Maps docs
- Exploring Overture data, no SQL required
- Overture Grabber
- Overture GERS: Towards Standardizing Place
- Overture Maps Explorer
Gradient methods¶
Hyperparameters Tuning¶
optuna
library for hyperparameter tuning in logistic regression- Gaussian processes for ML models tuning
- Optuna and sklearn integration
- Hyperparameters tuning with Optuna and human-in-the-loop
- Evolutionary and genetic algorithms for parameters tuning
- Bayesian hyperparameters optimization
mango
: a parallel hyperparameter tuning library- Mango tutorial
Information Theory¶
Kernel Methods¶
Large Language Models (LLM)¶
- ChatGPT Is An Extra-Ordinary Python Programmer
- StartChat Playground by Hugging Face
- What is ChatGPT doing and why does it work
- GPT in 60 Lines of NumPy
- privateGPT
- Pushing Prompt Engineering to the Limit
- How Foundation Model Providers Comply with the Draft EU AI Act
- A Gentle Introduction to LLM APIs
- All You Need to Know to Build Your First LLM App
- Mastering Prompt Engineering
- How to Run LLMs Locally
- LangChain: Building applications with LLMs through composability
- DeclarAI: turning Python code into production-ready LLM tasks
- Open Source LLMs To Power A LLM Application
- Large language models, explained with a minimum of math and jargon
- Inside GPT: Understanding the text generation
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Understand how BERT constructs state-of-the-art embeddings
- codellama
- NLP tasks via LLM
- From encoding to embeddings
- Large Language Models: Sentence-BERT
- Methods For Improving Your Large Language Model
- Vector Databases and How to Use Them to Augment LLM
- Large Language Models: RoBERTa, a Robustly Optimized BERT Approach
- DeepEval: Unit Testing for LLMs
- Attention Sinks in LLMs for endless fluency
- Generative AI exists because of the transformer: this is how it works
- OpenLLM Leaderboard
- All you need to know to Develop using Large Language Models
- LMQL: a programming language for large language models
- GPT-Engineer
- Chatbot Arena: Benchmarking LLMs in the Wild
- magentic: easily integrate Large Language Models into your Python code
- Hard Truths About Generative AI for Technology Leaders
- AlphaCodium: From Prompt Engineering to Flow Engineering
- Cheshire-Cat: Production ready AI assistant framework
- OLMo: a State-of-the-Art, Truly Open LLM and Framework
- Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
- Cohere For AI Launches Aya: an LLM Covering More Than 100 Languages
- A non-exhaustive but essential list of key papers that underpins text-to-video Deep Generative model like SORA
- Do large language models understand the world?
- A Visual Guide to Mamba and State Space Models
- Gemma: una nuova famiglia di modelli aperti
- DSPy: the framework for programming - not prompting! - foundation models
- Text Embeddings: Comprehensive Guide
- Developers with AI assistants need to follow the pair programming model
- LLM Evaluation
- A programming framework for agentic AI
- Gemma 2 optimized for your local machine
- GraphRAG: a modular graph-based Retrieval-Augmented Generation (RAG) system
- Explaining generative language models to (almost) anyone
- Auditing the Ask Astro LLM Q&A app
- The Rise of the LLM OS: From AIOS to MemGPT and beyond
- A Visual Guide to Quantization
- Unsloth: Finetune Llama 3.1, Mistral, Phi and Gemma
- Open WebUI: user-friendly WebUI for LLMs
- LangDrive: train LLMs on private data
- llmware: unified framework for building enterprise RAG pipelines with small, specialized models
- giskard: Open-Source Evaluation & Testing for LLMs and ML models
- talkd/dialog: RAG LLM Ops App for easy deployment and testing
- LLM sampling
- AI models collapse when trained on recursively generated data
- Trace: AutoDiff for AI Systems and LLM Agents
- Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data
- LitGPT: 2high-performance LLMs with recipes to pretrain, finetune and deploy at scale
- How to build a basic LLM GPT model from Scratch in Python
- guidance: a guidance language for controlling large language models
- "Attention, Please!": A Visual Guide To The Attention Mechanism
- How LLMs Work, Explained Without Math
- litellm: Python SDK, proxy server to call LLM APIs using the OpenAI format
- guardrails: adding guardrails to large language models
- Burr: build applications that make decisions (chatbots, agents, simulations). Monitor, trace, persist, and execute on your own infrastructure
- el: a language model programming library
- Model2Vec: Distill a Small Fast Model from any Sentence Transformer
- Beyond Traditional Testing: Addressing the Challenges of Non-Deterministic Software
- JIT Implementation: A Python Library That Implements Your Code at Runtime
- Open Source Frameworks for Building Generative AI Applications
- ChainLit: Build Conversational AI in minutes
- A RAG from scratch to query the scikit-learn documentation
- Introduction to Large Language Models
- DataChain: AI-data warehouse to enrich, transform and analyze unstructured data
- Simplemind: Python client for AI providers
- Official code repo for the O'Reilly Book "Hands-On Large Language Models"
- Docling: parse documents and export them to the desired format with ease and speed
- Posting: the modern API client that lives in your terminal
- Large Chainsaw Model
Machine Learning¶
Machine Learning Tooling GitHub space with ranked lists of awesome Python libraries for, updated weekly.
- GUI for ML workflow and pipeline discovery
- ML prototypes
- Designing intelligence
- AI, ML and DL
- Game theory for ML interpretation
pycaret
- Hybrid rule based ML
pycaret-2.0
- QLattice
- Applied ML use cases
- Google ML glossary
- 130 ML Tricks And Resources Carefully Curated
- Geomstats: a Python package for computations, statistics, machine learning and deep learning on manifolds
- LitServe: an easy-to-use, flexible serving engine for AI models built on FastAPI
- Causality in ML Models: Introducing Monotonic Constraints
Model evaluation¶
- Plot learning curve
- Validation sets
- ML Tool
- Validate and ML model
- Overfitting and underfitting
- Cross validation
- Validation curve
- MAPIE for confidence prediction intervals estimation
- Why You Should Never Use Cross-Validation
Model monitoring¶
- Static threshold vs anomalies and changepoints detection
- Different retrain strategies for ML models
- An end-to-end implementation of a prediction flow for kids who can't MLOps good
- Giskard: scan AI models to detect risks of biases, performance issues and errors
- MLflow
- Model drift
- Evidently for model monitoring
- Weights and Biases
- Sacred
- Omniboard as a Sacred frontend
- MLflow 101
- deepchecks
- MLNotify for training completion notification
- NannyML for post-deployment model performance monitoring
MLOps¶
- What is MLOps
- MLOps maturity checklist
- Why data makes MLOps different
- ML model deployment strategies
- MLOps lifecycles
- A curated (awesome!) list of open source libraries to deploy, monitor, version, scale and secure production machine learning
- The Full Stack 7-steps MLOps framework
- CD for ML
- Our MLOps story: Production-Grade Machine Learning for Twelve Brands
- No, You Don't Need MLOps
Marketing Analytics¶
- Beginner guide to Marketing Analytics
- Discrete-Time Markov Chains: Identifying Winning Customer Journeys in a Cashback Campaign
- Methods for Modelling Customer Lifetime Value: The Good Stuff and the Gotchas
Markov Chains¶
- Attribution model
- Progressions
- Market simulator with Mark chains
- Markov chains
- Hidden Markov models
- Markov chain process and HMM
- Markov chain as text generation model
- Beatles lyrics generation via Markov Chains
- Markov chains for time series forecasting
MCMC¶
- Animations with MCMC
- Monte Carlo tree search
- Monte Carlo in PBP
- MCMC for cryptography and optimization
pyro
- Metropolis-Hastings from scratch in Python
- Monte Carlo methods
- Random sampling via Python decorator
- Intro to Monte Carlo methods
- Simulating data with PyMC
- Mastering Monte Carlo: How to Simulate Your Way to Better Machine Learning Models
- Chaospy: a numerical toolbox for performing uncertainty quantification using polynomial chaos expansions and advanced Monte Carlo methods
- PyMC-Marketing: Bayesian Marketing Mix Modeling (MMM) & Customer Lifetime Value (CLV)
Nearest Neighbors¶
Neural Networks¶
- Neural networks and deep learning
- Make NN paint to understand how they work
- Intro to neural networks
- NN manifolds topology
- Visualizing optimization trajectory in neural networks
- Deep dream convolutional networks full code
- Neural networks as ensembles of simpler models
- Neural networks as functions composition
- N-Students learning framework
- A short history of Neural Networks
- Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
- AI Canon
Natural Language Processing (NLP)¶
- EDA and visualization of text data
text2emotion
to detect emotions from textual data- Quantify information in statements with entropy from Information Theory
- NLTK
- Chatbot using
rasa
rasa
: open source ML framework to automate text-and voice-based conversations- WordNet for a lexical taxonomy of English words
- Clean text
- Gramformer for text correction
- Styleformer for text styling
- Data jobs description analyzed with scattertext
- Scattertext
texthero
yarl
for URL processingEcco
for pattern visualization in text data- Data QA to label data
- Text summarization
- Text similarity with Levenshtein distance
- Autocorrect for multilanguage spelling correction
- Neattext for cleaning textual data and text preprocessing
- Texthero tutorial
- Microsoft
presidio
for NER (Named Entity Recognition) and data anonymization - SEER model for information extraction based on user-specified examples
- Textnets: text analysis with networks
- Universal romanizer tool
- Text summarization
- Sentence embedding
- Semantic search with
txtai
- Arabica and Cappuccino for text EDA
- Simple spelling check in Python
- A guide to computational linguistics and conversational AI
- diff-match-patch: robust algorithms to perform the operations required for synchronizing plain text
- PolyFuzz: fuzzy string matching and string grouping
- Outlines: a library for neural text generation
sense2vec
: query vectors for multi-word phrases based on part-of-speech tags and entity labels- NLP Course
Topic modeling¶
- Concept modeling to link text and images
- Topic coherence measures
- Intro to topic modeling with Latent Dirichlet Allocation (LDA)
- Topic modeling strategies comparison
- Hands-on topic modeling via LDA
- Advanced Topic Modeling with BERTopic
- Topic Modeling with Llama 2
- cluestar: visualisation tools to get started with text classification tasks
- Practical Guide to Topic Modeling with Latent Dirichlet Allocation (LDA)
- BERTrend: Neural Topic Modeling for Emerging Trends Detection
Objects tracking¶
OCR¶
- Open source OCR tools
- Extract text written in different languages with
easyocr
- Apple Vision wrapper for text extraction, scalar representation and clustering using K-means
Optimization¶
- Lagrange multipliers
- Openopt
- OR-Tools
- Solving Sudoku via AI
- Arbitrage strategy with linear programming
- Lagrange multiplier demystified
- Python constraint
- Surrogate optimization
- Particle Swarm Optimization (PSO)
- Traveling Salesman Problem (TSP) heuristic
- Guide to dynamic programming
- Goal programming
- Artificial Bee Colony algorithm
- Animate particle swarm optimization
- ErlangC queue optimization with
pyworkforce
- Optimization heuristics
- How Amazon learned to cut its cardboard waste with pioneering web-based PackOpt tool
- Route optimization with Python
- The Vehicle Routing Problem: Exact and Heuristic Solutions
- List of optimization packages in Python
- A Comprehensive Guide to Modeling Techniques in Mixed-Integer Linear Programming
Scholarpedia¶
Pattern mining¶
Physics¶
Predictive Maintenance¶
- Understanding Predictive Maintenance: Data Acquisition and Signal Denoising
- Understanding Predictive Maintenance: Unit Roots and Stationarity
- Understanding Predictive Maintenance: Wave Data and Feature Engineering (Part 1)
Probability & Statistics¶
- Chi-square test
- Probability and statistics for DS
- Statistical significance
- Hypothesis test
- How to determine significance for hypothesis testing
- Visualize hypothesis testing
- Causal vs statistical inference
- Phi_k correlation coefficient
- PP score
- Generate random variables
pingouin
better thanstatsmodels
pingouin
library for statistical tests- Guide to confidence intervals
- Stop using p = 0.05
- What p-value stands for
- Optimal sample size
- Experiment design
- Rule of three: calculating probability of events not yet occurred
- Hypotheses testing with
scipy
- Adaptive p-value
- Probability distributions Q&A - part 1
- Probability distributions Q&A - part 2
- Correlation visually explained
- 3 t-tests for data scientists
- Stats gist list: guide to jargon by Cassie Kozyrkov
- Algorithmic approach to statistical testing
- Kolmogorov-Smirnov test to check how data are distributed
- Empirical cumulative distribution: advantages over histogram for EDA
- Hypothesis Testing Explained (How I Wish It Was Explained to Me)
Regression¶
- Generalized linear models
- Generalized linear regression with
scikit-learn
- Ordinary least squared regression
- Adaptive LASSO
- What happens when you break the assumptions of linear regression
- Statistics supporting linear models
- Geodesic regression
- Symbolic regression
- Ridge regression from scratch
- Regularization in regression
- Deming regression
- Interpreting linear regression sum-up from statsmodels
- Logistic regression 101
- Complete guide to regression analysis
- Constrained logistic regression
- Robust regression
- Polynomial regression with scikit-learn
Reinforcement Learning¶
- Playing Blackjack with RL
- Math behind reinforcement learning
- How RL works
- Create a custom RL enviroment
- Start learning RL
- Dive into RL
- Policy gradient algorithms
- RL fundamentals
- Markov decision process
- Bellman equation and dynamic programming
- Reinforcement Learning series
- Multi-agent particle swarm
- The K-armed bandit problem
- Python packages to experiment with Reinforcement Learning
- Reinforcement Learning algorithms explained
- Training an Agent to Master a Simple Game Through Self-Play
- Training an Agent to Master Tic-Tac-Toe Through Self-Play
- Stablebaseline3: the Swiss Army Knife of Applied RL
Resampling¶
Revenue science¶
Scada data analysis¶
Similarity measures¶
Simulated Annealing¶
SQL¶
- PugSQL
- Window functions in SparkSQL
- Advanced SQL queries in pandas
- SQL window functions
- 5 SQL common queries
- SQL window functions
- Intermediate SQL queries
- 6 lesser known queries
- 10 SQL tips
- Settings for NLS in SQL Developer
- Advanced SQL concepts
- SQL advanced functions: qualify, arrays and more
- SQL CASE
- Python built-in database: SQLite
- Advanced SQL for Data Scientists: cube, array, window and math functions
- Lost at SQL: the SQL learning game
- Window Functions: A Must Know for Data Engineers and Data Scientists
- How to Low-Pass Filter in Google BigQuery
- Harlequin: the SQL IDE for your terminal
- SQLModel: a library for interacting with SQL databases from Python code, with Python objects
- Sampling with SQL
Streaming/Online Learning¶
Structural Equation Modeling (SEM)¶
SVM¶
Synthetic data¶
Time Series¶
- Nested cross validation
- TS and feature selection
- Out-of-time validation
- scikit-learn prediction intervals
- Forecast visualization
- TS transfer learning
- Detecting stationarity
- End to end project
- Analysis and forecasting
- Seasonal ARIMA
- Forecasting models overview
- Causality inference
- Giotto time
- Time Series Analysis
- Time series in Python
- Matrix profile
- Brownian motion in Python
- Forecast energy consumption with neural networks and xgboost
- TS forecasting
- Statistical tests and ARIMA
- Dynamic time warping
- Whale identification TS processing
- Pattern mining with
stumpy
- Statistical tests for trend
- How to synchronize time series
- Time series libraries
- Kats by Facebook
- Merlion by Salesforce
- Darts
- Avoid data leakage in time series
- Orbit by Uber for Bayesian time series forecasting
- Time Series terminology
- Time Series forecasting cheatsheet
- Poisson Hidden Markov Model for Time Series regression
- PyCaret AutoML for Time Series
- Seasonal adjustment of daily time series
- TSA basics
- Univariate time series forecasting with Neural Networks
sktime
as sklearn TSA interface- Changepoints detection with e-divisive
- Time series data visualization
statsforecast
for lightning fast forecasting- Time features encoding: cyclic vs dummy vs numeric
- Scalecast
- Hierarchical forecast reconciliation
- Deep Learning for time series forecasting
- Interpreting ACF and PACF plots for time series
- Python Automatic Forecasting
- Forecasting with tree-based algorithms
- FEDOT: an AutoML approach to time series forecasting
- Time series forecasting with Transformers
- Conformal prediction interval with scikit-learn, MAPIE and TSPIRAL
mlforecast
: scalable machine learning based time series forecasting- Time Series Forecasting with Scikit-learn
- Time Series for Climate Change: Forecasting Energy Demand
- Skforecast: a Python library that eases using scikit-learn regressors as single and multi-step forecasters
- Time series complexity analysis using entropy
functime
is a powerful Python library for production-ready global forecasting and time-series feature extraction on large panel datasets- Feature Engineering for Time Series Regression
- TimeGPT: The First Foundation Model for Time Series Forecasting
- Group time-series split
- Feature Engineering for Time Series
- TSMixer: The Latest Forecasting Model by Google
tsfresh
: Time Series Feature extraction based on scalable hypothesis testspytimetk
: time series easier, faster, more funautogluon
: AutoML for Image, Text, Time Series, and Tabular Data- AutoGluon-TimeSeries: Every Time Series Forecasting Model In One Library
- Time Series Forecasting with TiDE
- Temporian: an open-source Python library for preprocessing and feature engineering temporal data for machine learning applications
Prophet¶
- Is Facebook's "Prophet" the Time-Series Messiah, or Just a Very Naughty Boy?
- LSTM and Prophet
- Prophet forecasting
- AutoArima Prophet adapter in statsforecast
- Fixing Prophet forecasting issue
Greykite¶
Tree-based methods¶
- Entropy in decision trees
- Intuition behind Shannon entropy
- Explaining feature importance
catboost
for gradient boosting decision treescatboost
docs- Understanding decision trees
- Random Forest interpretability
catboost
for model interpretation- Visualize bagging effect on bias and variance
- How to draw decision trees
- Decision trees code
- Why you should learn
catboost
- Intuition behind
xgboost
- Tree boosted mixed models
- Random Forest in ML
- Multiple imputation with Random Forest
- Ensemble learning
- Decision tree and overfitting
- From boosting to gradient boosting
- Decision trees and lookahead strategy
- AdaBoost mathematical approach
- How to visualize Decision trees
- Random Forest vs Gradient Boosting
- Why bagging works
- Gradient boosted trees explained
- Maths and viz of Gradient Boosting
- Intuitive explanation of entropy
Weather data¶
- Prediction of severe thunderstorm events with ensemble deep learning and radar data
- Pirate weather API
- Meteostat Python library
- GraphCast: AI model for faster and more accurate global weather forecasting
- GraphCast: Learning skillful medium-range global weather forecasting
- NeuralGCM: a Python library for building hybrid ML/physics atmospheric models for weather and climate simulation
- Aurora: a Foundation Model of the Atmosphere
XAI¶
- Interpretable ML
- Interpretable ML with Python
- SHAP decision plot
- Making sense of Shapley values
- SHAP values and kernelexplainer
- Additive feature importances
- SHAP overview
- Explainer dashboard
- Shapash model explaining webapp
- Black box vs glass box models
- Interpretable Machine Learning book
- InterpretML
- Permutation feature importance
- Interpretation of Isolation Forest with
shap
- Eli5
- SHAP vs ACV
- FastTreeSHAP: speed up SHAP values computation for tree-based models
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
- PiML (Python Interpretable Machine Learning) toolbox for model development & diagnostics