Machine Learning System Design
Just about two years ago, I started to research and read more about the applications of machine learning and deep learning techniques beyond the textbooks. Initially, it was just an enthusiasm, but soon it became a habit! I started by reading blog posts and engineering blogs published by big tech companies to have a better understanding about what is actually happening behind the scenes and how the basic ideas fit into the software engineering design components including web servers, data bases, load balancers, etc.
In this post, I gathered a growing list of such materials, where a great deal of this effort has already been done in other posts that I cited them at the end. I tried to organize these materials and posts in a logical manner that is easier to follow for somebody who already have some knowledge in ML/DL field and is interested to know more about the implementation and deployment of these concepts.
I have also prepared a similar categorization for the state of the art (SOTA) papers presented for the popular applications. Hopefully, I can continue and keep it updated.
Last update: January 2022
Table of Contents
Part-1: Systems and Concepts by Companies
Part-2: Concepts Reviews and Surveys
- Recommendation
- Deep Learning
- Natural Language Processing
- Computer Vision
- Vision and Language
- Reinforcement Learning
- Graph
- Embeddings
- Meta-learning and Few-shot Learning
- Others
Part-3: Concepts In Practic
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team Structure
- Fails
Part-1 Machine Learning System Desing Patterns in Tech Companies
Google Machine Learning Courses π
- Googleβs fast-paced, practical introduction to machine learning
- Recommendation systems
general information regarding candidate generation (content-based and collaborative filtering), retrieval, scoring, and re-ranking- Google Research Publications | Data Mining and Modeling
- Google Research Publications | Machine Intelligence
- Tensorflow embedding projector
- Jeff Dean On Large-Scale Deep Learning At Google
Facebook (Meta) Machine Learning Contents π
- Field Guide to Machine Learning video series
- Fighting Abuse @Scale
- Preventing abuse using unsupervised learning
- Community Standards report
- Scalable data classification for security and privacy
- Unicorn: A System for Searching the Social Graph
- Hive β A Petabyte Scale Data Warehouse using Hadoop
- Nemo: Data discovery at Facebook
- Embedding-based Retrieval in Facebook Search (Paper)
- How machine learning powers Facebookβs News Feed ranking algorithm
- Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective (Paper)
- Neural Code Search: ML-based code search using natural language queries
- AI advances to better detect hate speech
- Leveraging online social interactions for enhancing integrity at Facebook
- Scalable data classification for security and privacy
- Powered by AI: Advancing product understanding and building new shopping experiences
- Hereβs how weβre using AI to help detect misinformation
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce
Instagram π
Twitter Engineering π
Uber π
- Forecasting at Uber: An Introduction
- Applying Customer Feedback: How NLP & Deep Learning Improve Uberβs Maps
- Food Discovery with Uber Eats: Building a Query Understanding Engine
- Food Discovery with Uber Eats: Recommending for the Marketplace
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Netflix π
LinkedIn π
Airbnb π
- Using Machine Learning to Predict Value of Homes On Airbnb
- Listing Embeddings in Search Ranking
- Learning Market Dynamics for Optimal Pricing
- Categorizing Listing Photos at Airbnb
- Applying Deep Learning To Airbnb Search
- Discovering and Classifying In-app Message Intent at Airbnb
- Machine Learning-Powered Search Ranking of Airbnb Experiences
Spotify π
Part-2: Machine Learning Desing Concept | Reviews and Surveys
Recommendation π
- Algorithms: Recommender systems survey (2013)
- Algorithms: Deep Learning based Recommender System: A Survey and New Perspectives (2019)
- Algorithms: Are We Really Making Progress? An Analysis of Neural Recommendation Approaches (2019)
- Serendipity: A Survey of Serendipity in Recommender Systems (2016)
- Diversity: Diversity in Recommender Systems β A survey (2017)
- Explanations: A Survey of Explanations in Recommender Systems (2007)
Deep Learning π
- Architecture: A State-of-the-Art Survey on Deep Learning Theory and Architectures (2019)
- Knowledge distillation: Knowledge Distillation: A Survey (2021)
- Model compression: Compression of Deep Learning Models for Text: A Survey (2020)
- Transfer learning: A Survey on Deep Transfer Learning (2018)
- Neural architecture search: A Comprehensive Survey of Neural Architecture Search (2021)
- Neural architecture search: Neural Architecture Search: A Survey (2019)
Natural Language Processing π
- Deep Learning: Recent Trends in Deep Learning Based Natural Language Processing (2018)
- Classification: Deep Learning Based Text Classification: A Comprehensive Review (2021)
- Generation: Survey of the SOTA in Natural Language Generation: Core tasks, applications and evaluation (2018)
- Generation: Neural Language Generation: Formulation, Methods, and Evaluation (2020)
- Transfer learning: Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer (2020)
- Transformers: Efficient Transformers: A Survey (2020)
- Metrics: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (2020)
- Metrics: Evaluation of Text Generation: A Survey (2020)
Computer Vision π
- Object detection: Object Detection in 20 Years (2019)
- Adversarial attacks: Threat of Adversarial Attacks on Deep Learning in Computer Vision (2018)
- Autonomous vehicles: Computer Vision for Autonomous Vehicles: Problems, Datasets and SOTA (2021)
- Image Captioning: A Comprehensive Survey of Deep Learning for Image Captioning (2018)
Vision and Language π
- Trends: Trends in Integration of Vision and Language Research: Tasks, Datasets, and Methods (2021)
- Trends: Multimodal Research in Vision and Language: Current and Emerging Trends (2020)
Reinforcement Learning π
- Algorithms: A Brief Survey of Deep Reinforcement Learning (2017)
- Transfer learning: Transfer Learning for Reinforcement Learning Domains (2009)
- Economics: Review of Deep Reinforcement Learning Methods and Applications in Economics (2020)
- Discovery: Deep Reinforcement Learning for Search, Recommendation, and Online Advertising (2018)
Graph π
- Survey: A Comprehensive Survey on Graph Neural Networks (2019)
- Survey: A Practical Guide to Graph Neural Networks (2020)
- Fraud detection: A systematic literature review of graph-based anomaly detection approaches (2020)
- Knowledge graphs: A Comprehensive Introduction to Knowledge Graphs (2021)
Embeddings π
- Text: From Word to Sense Embeddings:A Survey on Vector Representations of Meaning (2018)
- Text: Diachronic Word Embeddings and Semantic Shifts (2018)
- Text: Word Embeddings: A Survey (2019)
- Text: A Reproducible Survey on Word Embeddings and Ontology-based Methods for Word Similarity (2019)
- Graph: A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications (2017)
Meta-learning and Few-shot Learning π
- NLP: Meta-learning for Few-shot Natural Language Processing: A Survey (2020)
- Domain Agnostic: Learning from Few Samples: A Survey (2020)
- Neural Networks: Meta-Learning in Neural Networks: A Survey (2020)
- Domain Agnostic: A Comprehensive Overview and Survey of Recent Advances in Meta-Learning (2020)
- Domain Agnostic: Baby steps towards few-shot learning with multiple semantics (2020)
- Domain Agnostic: Meta-Learning: A Survey (2018)
- Domain Agnostic: A Perspective View And Survey Of Meta-learning (2002)
Others π
- Transfer learning: A Survey on Transfer Learning (2009)
Part-3: Machine Learning Desing Concepts | By Topics
Data Quality π
- Reliable and Scalable Data Ingestion at Airbnb β
Airbnb
2016
- Monitoring Data Quality at Scale with Statistical Modeling β
Uber
2017
- Data Management Challenges in Production Machine Learning (Paper) β
Google
2017
- Automating Large-Scale Data Quality Verification (Paper)
Amazon
2018
- Meet Hodor β Gojekβs Upstream Data Quality Tool β
Gojek
2019
- An Approach to Data Quality for Netflix Personalization Systems β
Netflix
2020
- Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper) β
Facebook
2020
Data Engineering π
- Zipline: Airbnbβs Machine Learning Data Management Platform β
Airbnb
2018
- Sputnik: Airbnbβs Apache Spark Framework for Data Engineering β
Airbnb
2020
- Unbundling Data Science Workflows with Metaflow and AWS Step Functions β
Netflix
2020
- How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand β
DoorDash
2020
- Revolutionizing Money Movements at Scale with Strong Data Consistency β
Uber
2020
- Zipline - A Declarative Feature Engineering Framework β
Airbnb
2020
- Automating Data Protection at Scale, Part 1 (Part 2) β
Airbnb
2021
- Real-time Data Infrastructure at Uber β
Uber
2021
- Introducing Fabricator: A Declarative Feature Engineering Framework β
Doordash
2022
- Functions & DAGs: introducing Hamilton, a microframework for dataframe generation β
Stitch Fix
2021
Data Discovery π
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code) β
Apache
- Collect, Aggregate, and Visualize a Data Ecosystemβs Metadata (Code) β
WeWork
- Discovery and Consumption of Analytics Data at Twitter β
Twitter
2016
- Democratizing Data at Airbnb β
Airbnb
2017
- Databook: Turning Big Data into Knowledge with Metadata at Uber β
Uber
2018
- Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code) β
Netflix
2018
- Amundsen β Lyftβs Data Discovery & Metadata Engine β
Lyft
2019
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code) β
Lyft
2019
- DataHub: A Generalized Metadata Search & Discovery Tool (Code) β
LinkedIn
2019
- Amundsen: One Year Later β
Lyft
2020
- Using Amundsen to Support User Privacy via Metadata Collection at Square β
Square
2020
- Turning Metadata Into Insights with Databook β
Uber
2020
- DataHub: Popular Metadata Architectures Explained β
LinkedIn
2020
- How We Improved Data Discovery for Data Scientists at Spotify β
Spotify
2020
- How Weβre Solving Data Discovery Challenges at Shopify β
Shopify
2020
- Nemo: Data discovery at Facebook β
Facebook
2020
- Exploring Data @ Netflix (Code) β
Netflix
2021
Feature Stores π
- Distributed Time Travel for Feature Generation β
Netflix
2016
- Building the Activity Graph, Part 2 (Feature Storage Section) β
LinkedIn
2017
- Fact Store at Scale for Netflix Recommendations β
Netflix
2018
- Zipline: Airbnbβs Machine Learning Data Management Platform β
Airbnb
2018
- Introducing Feast: An Open Source Feature Store for Machine Learning (Code) β
Gojek
2019
- Michelangelo Palette: A Feature Engineering Platform at Uber β
Uber
2019
- The Architecture That Powers Twitterβs Feature Store β
Twitter
2019
- Accelerating Machine Learning with the Feature Store Service β
CondΓ© Nast
2019
- Feast: Bridging ML Models and Data β
Gojek
2020
- Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression β
DoorDash
2020
- Rapid Experimentation Through Standardization: Typed AI features for LinkedInβs Feed β
LinkedIn
2020
- Building a Feature Store β
Monzo Bank
2020
- Butterfree: A Spark-based Framework for Feature Store Building (Code) β
QuintoAndar
2020
- Building Riviera: A Declarative Real-Time Feature Engineering Framework β
DoorDash
2021
- Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory β
Uber
2021
- ML Feature Serving Infrastructure at Lyft β
Lyft
2021
Classification π
- Prediction of Advertiser Churn for Google AdWords (Paper) β
Google
2010
- High-Precision Phrase-Based Document Classification on a Modern Scale (Paper) β
LinkedIn
2011
- Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper) β
Walmart
2014
- Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper) β
NAVER
2016
- Learning to Diagnose with LSTM Recurrent Neural Networks (Paper) β
Google
2017
- Discovering and Classifying In-app Message Intent at Airbnb β
Airbnb
2019
- Teaching Machines to Triage Firefox Bugs β
Mozilla
2019
- Categorizing Products at Scale β
Shopify
2020
- How We Built the Good First Issues Feature β
GitHub
2020
- Testing Firefox More Efficiently with Machine Learning β
Mozilla
2020
- Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper) β
Microsoft
2020
- Scalable Data Classification for Security and Privacy (Paper) β
Facebook
2020
- Uncovering Online Delivery Menu Best Practices with Machine Learning β
DoorDash
2020
- Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging β
DoorDash
2020
- Deep Learning: Product Categorization and Shelving β
Walmart
2021
- Large-scale Item Categorization for e-Commerce (Paper) β
DianPing
,eBay
2021
Regression π
- Using Machine Learning to Predict Value of Homes On Airbnb β
Airbnb
2017
- Using Machine Learning to Predict the Value of Ad Requests β
Twitter
2020
- Open-Sourcing Riskquant, a Library for Quantifying Risk (Code) β
Netflix
2020
- Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment β
DoorDash
2020
Forecasting π
- Engineering Extreme Event Forecasting at Uber with RNN β
Uber
2017
- Forecasting at Uber: An Introduction β
Uber
2018
- Transforming Financial Forecasting with Data Science and Machine Learning at Uber β
Uber
2018
- Under the Hood of Gojekβs Automated Forecasting Tool β
Gojek
2019
- BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video) β
Google
2020
- Retraining Machine Learning Models in the Wake of COVID-19 β
DoorDash
2020
- Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code) β
Atlassian
2020
- Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code) β
Uber
2021
- Managing Supply and Demand Balance Through Machine Learning β
DoorDash
2021
- Greykite: A flexible, intuitive, and fast forecasting library β
LinkedIn
2021
Recommendation π
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper) β
Amazon
2003
- Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2) β
Netflix
2012
- How Music Recommendation Works β And Doesnβt Work β
Spotify
2012
- Learning to Rank Recommendations with the k -Order Statistic Loss (Paper) β
Google
2013
- Recommending Music on Spotify with Deep Learning β
Spotify
2014
- Learning a Personalized Homepage β
Netflix
2015
- Session-based Recommendations with Recurrent Neural Networks (Paper) β
Telefonica
2016
- Deep Neural Networks for YouTube Recommendations β
YouTube
2016
- E-commerce in Your Inbox: Product Recommendations at Scale (Paper) β
Yahoo
2016
- To Be Continued: Helping you find shows to continue watching on Netflix β
Netflix
2016
- Personalized Recommendations in LinkedIn Learning β
LinkedIn
2016
- Personalized Channel Recommendations in Slack β
Slack
2016
- Recommending Complementary Products in E-Commerce Push Notifications (Paper) β
Alibaba
2017
- Artwork Personalization at Netflix β
Netflix
2017
- A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper) β
Twitter
2017
- Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper) β
Pinterest
2017
- How 20th Century Fox uses ML to predict a movie audience (Paper) β
20th Century Fox
2018
- Calibrated Recommendations (Paper) β
Netflix
2018
- Food Discovery with Uber Eats: Recommending for the Marketplace β
Uber
2018
- Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper) β
Spotify
2018
- Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper) β
Alibaba
2019
- SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper) β
Alibaba
2019
- Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper) β
Alibaba
2019
- Personalized Recommendations for Experiences Using Deep Learning β
TripAdvisor
2019
- Powered by AI: Instagramβs Explore recommender system β
Facebook
2019
- Marginal Posterior Sampling for Slate Bandits (Paper) β
Netflix
2019
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations β
Uber
2019
- Music recommendation at Spotify β
Spotify
2019
- Using Machine Learning to Predict what File you Need Next (Part 1) β
Dropbox
2019
- Using Machine Learning to Predict what File you Need Next (Part 2) β
Dropbox
2019
- Learning to be Relevant: Evolution of a Course Recommendation System β
LinkedIn
2019
- Temporal-Contextual Recommendation in Real-Time (Paper) β
Amazon
2020
- P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper) β
Amazon
2020
- Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper) β
Alibaba
2020
- TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper) β
Alibaba
2020
- PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper) β
Alibaba
2020
- Controllable Multi-Interest Framework for Recommendation (Paper) β
Alibaba
2020
- MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper) β
Alibaba
2020
- ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper) β
Alibaba
2020
- For Your Ears Only: Personalizing Spotify Home with Machine Learning β
Spotify
2020
- Reach for the Top: How Spotify Built Shortcuts in Just Six Months β
Spotify
2020
- Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper) β
Spotify
2020
- The Evolution of Kit: Automating Marketing Using Machine Learning β
Shopify
2020
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) β
LinkedIn
2020
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) β
LinkedIn
2020
- Building a Heterogeneous Social Network Recommendation System β
LinkedIn
2020
- How TikTok recommends videos #ForYou β
ByteDance
2020
- Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper) β
Google
2020
- Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper) β
Google
2020
- Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper) β
Google
2020
- Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper) β
Tencent
2020
- A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper) β
Home Depot
2020
- Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper) β
Ikea
2020
- How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads β
Pinterest
2020
- Multi-task Learning for Related Products Recommendations at Pinterest β
Pinterest
2020
- Improving the Quality of Recommended Pins with Lightweight Ranking β
Pinterest
2020
- Personalized Cuisine Filter Based on Customer Preference and Local Popularity β
DoorDash
2020
- How We Built a Matchmaking Algorithm to Cross-Sell Products β
Gojek
2020
- Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper) β
Twitter
2021
- Self-supervised Learning for Large-scale Item Recommendations (Paper) β
Google
2021
- Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper) β
ByteDance
2021
- Using AI to Help Health Experts Address the COVID-19 Pandemic β
Facebook
2021
- Advertiser Recommendation Systems at Pinterest β
Pinterest
2021
- On YouTubeβs Recommendation System β
YouTube
2021
Search & Ranking π
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code) β
Amazon
2016
- How Lazada Ranks Products to Improve Customer Experience and Conversion β
Lazada
2016
- Ranking Relevance in Yahoo Search (Paper) β
Yahoo
2016
- Learning to Rank Personalized Search Results in Professional Networks (Paper) β
LinkedIn
2016
- Using Deep Learning at Scale in Twitterβs Timelines β
Twitter
2017
- An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper) β
Etsy
2017
- Powering Search & Recommendations at DoorDash β
DoorDash
2017
- Applying Deep Learning To Airbnb Search (Paper) β
Airbnb
2018
- In-session Personalization for Talent Search (Paper) β
LinkedIn
2018
- Talent Search and Recommendation Systems at LinkedIn (Paper) β
LinkedIn
2018
- Food Discovery with Uber Eats: Building a Query Understanding Engine β
Uber
2018
- Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper) β
Alibaba
2018
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) β
Alibaba
2018
- Semantic Product Search (Paper) β
Amazon
2019
- Machine Learning-Powered Search Ranking of Airbnb Experiences β
Airbnb
2019
- Entity Personalized Talent Search Models with Tree Interaction Features (Paper) β
LinkedIn
2019
- The AI Behind LinkedIn Recruiter Search and recommendation systems β
LinkedIn
2019
- Learning Hiring Preferences: The AI Behind LinkedIn Jobs β
LinkedIn
2019
- The Secret Sauce Behind Search Personalisation β
Gojek
2019
- Neural Code Search: ML-based Code Search Using Natural Language Queries β
Facebook
2019
- Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper) β
Alibaba
2019
- Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search β
Alibaba
2019
- Understanding Searches Better Than Ever Before (Paper) β
Google
2019
- How We Used Semantic Search to Make Our Search 10x Smarter β
Tokopedia
2019
- Query2vec: Search query expansion with query embeddings β
GrubHub
2019
- MOBIUS: Towards the Next Generation of Query-Ad Matching in Baiduβs Sponsored Search β
Baidu
2019
- Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper) β
Amazon
2020
- Managing Diversity in Airbnb Search (Paper) β
Airbnb
2020
- Improving Deep Learning for Airbnb Search (Paper) β
Airbnb
2020
- Quality Matches Via Personalized AI for Hirer and Seeker Preferences β
LinkedIn
2020
- Understanding Dwell Time to Improve LinkedIn Feed Ranking β
LinkedIn
2020
- Ads Allocation in Feed via Constrained Optimization (Paper, Video) β
LinkedIn
2020
- Understanding Dwell Time to Improve LinkedIn Feed Ranking β
LinkedIn
2020
- AI at Scale in Bing β
Microsoft
2020
- Query Understanding Engine in Traveloka Universal Search β
Traveloka
2020
- Bayesian Product Ranking at Wayfair β
Wayfair
2020
- COLD: Towards the Next Generation of Pre-Ranking System (Paper) β
Alibaba
2020
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) β
Pinterest
2020
- Driving Shopping Upsells from Pinterest Search β
Pinterest
2020
- GDMix: A Deep Ranking Personalization Framework (Code) β
LinkedIn
2020
- Bringing Personalized Search to Etsy β
Etsy
2020
- Building a Better Search Engine for Semantic Scholar β
Allen Institute for AI
2020
- Query Understanding for Natural Language Enterprise Search (Paper) β
Salesforce
2020
- Things Not Strings: Understanding Search Intent with Better Recall β
DoorDash
2020
- Query Understanding for Surfacing Under-served Music Content (Paper) β
Spotify
2020
- Embedding-based Retrieval in Facebook Search (Paper) β
Facebook
2020
- Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper) β
JD
2020
- QUEEN: Neural query rewriting in e-commerce (Paper) β
Amazon
2021
- Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper) β
Amazon
2021
- Seasonal relevance in e-commerce search (Paper) β
Amazon
2021
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) β
Alibaba
2021
- How We Built A Context-Specific Bidding System for Etsy Ads β
Etsy
2021
- Pre-trained Language Model based Ranking in Baidu Search (Paper) β
Baidu
2021
- Stitching together spaces for query-based recommendations β
Stitch Fix
2021
- Deep Natural Language Processing for LinkedIn Search Systems (Paper) β
LinkedIn
2021
- Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset (Paper, Code) β
Seznam
2021
Embeddings π
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper) β
Sears
2017
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper) β
Alibaba
2018
- Embeddings@Twitter β
Twitter
2018
- Listing Embeddings in Search Ranking (Paper) β
Airbnb
2018
- Understanding Latent Style β
Stitch Fix
2018
- Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper) β
LinkedIn
2018
- Personalized Store Feed with Vector Embeddings β
DoorDash
2018
- Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper) β
Moshbit
2019
- Machine Learning for a Better Developer Experience β
Netflix
2020
- Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code) β
Google
2020
- Embedding-based Retrieval at Scribd β
Scribd
2021
Natural Language Processing π
- Abusive Language Detection in Online User Content (Paper) β
Yahoo
2016
- Smart Reply: Automated Response Suggestion for Email (Paper) β
Google
2016
- Building Smart Replies for Member Messages β
LinkedIn
2017
- How Natural Language Processing Helps LinkedIn Members Get Support Easily β
LinkedIn
2019
- Gmail Smart Compose: Real-Time Assisted Writing (Paper) β
Google
2019
- Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper) β
Amazon
2019
- Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want β
Stitch Fix
2019
- DeText: A deep NLP Framework for Intelligent Text Understanding (Code) β
LinkedIn
2020
- SmartReply for YouTube Creators β
Google
2020
- Using Neural Networks to Find Answers in Tables (Paper) β
Google
2020
- A Scalable Approach to Reducing Gender Bias in Google Translate β
Google
2020
- Assistive AI Makes Replying Easier β
Microsoft
2020
- AI Advances to Better Detect Hate Speech β
Facebook
2020
- A State-of-the-Art Open Source Chatbot (Paper) β
Facebook
2020
- A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs β
Facebook
2020
- Deep Learning to Translate Between Programming Languages (Paper, Code) β
Facebook
2020
- Deploying Lifelong Open-Domain Dialogue Learning (Paper) β
Facebook
2020
- Introducing Dynabench: Rethinking the way we benchmark AI β
Facebook
2020
- How Gojek Uses NLP to Name Pickup Locations at Scale β
Gojek
2020
- The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper) β
Baidu
2020
- PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code) β
Google
2020
- Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo) β
Salesforce
2020
- GeDi: A Powerful New Method for Controlling Language Models (Paper, Code) β
Salesforce
2020
- Applying Topic Modeling to Improve Call Center Operations β
RICOH
2020
- WIDeText: A Multimodal Deep Learning Framework β
Airbnb
2020
- Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code) β
Facebook
2021
- How we reduced our text similarity runtime by 99.96% β
Microsoft
2021
- Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models) β
Facebook
2021
Sequence Modelling π
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper) β
Sutter Health
2015
- Deep Learning for Understanding Consumer Histories (Paper) β
Zalando
2016
- Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper) β
Sutter Health
2016
- Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper) β
Telefonica
2017
- Deep Learning for Electronic Health Records (Paper) β
Google
2018
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba
2019
- Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper) β
Alibaba
2020
- How Duolingo uses AI in every part of its app β
Duolingo
2020
- Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video) β
Facebook
2020
- Using deep learning to detect abusive sequences of member activity (Video) β
LinkedIn
2021
Computer Vision π
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning β
Dropbox
2017
- Categorizing Listing Photos at Airbnb β
Airbnb
2018
- Amenity Detection and Beyond β New Frontiers of Computer Vision at Airbnb β
Airbnb
2019
- How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors β
Deepomatic
- Making machines recognize and transcribe conversations in meetings using audio and video β
Microsoft
2019
- Powered by AI: Advancing product understanding and building new shopping experiences β
Facebook
2020
- A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) β
Google
2020
- Machine Learning-based Damage Assessment for Disaster Relief (Paper) β
Google
2020
- RepNet: Counting Repetitions in Videos (Paper) β
Google
2020
- Converting Text to Images for Product Discovery (Paper) β
Amazon
2020
- How Disney Uses PyTorch for Animated Character Recognition β
Disney
2020
- Image Captioning as an Assistive Technology (Video) β
IBM
2020
- AI for AG: Production machine learning for agriculture β
Blue River
2020
- AI for Full-Self Driving at Tesla β
Tesla
2020
- On-device Supermarket Product Recognition β
Google
2020
- Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper) β
Google
2020
- Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) β
Pinterest
2020
- Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper) β
Google
2020
- Vision-based Price Suggestion for Online Second-hand Items (Paper) β
Alibaba
2020
- New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model) β
Facebook
2021
- An Efficient Training Approach for Very Large Scale Face Recognition (Paper) β
Alibaba
2021
- Identifying Document Types at Scribd β
Scribd
2021
- Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper) β
Walmart
2021
Reinforcement Learning π
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper) β
Alibaba
2018
- Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper) β
Alibaba
2018
- Reinforcement Learning for On-Demand Logistics β
DoorDash
2018
- Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) β
Alibaba
2018
- Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper) β
Alibaba
2019
- Productionizing Deep Reinforcement Learning with Spark and MLflow β
Zynga
2020
- Deep Reinforcement Learning in Production Part1 Part 2 β
Zynga
2020
- Building AI Trading Systems β
Denny Britz
2020
Anomaly Detection π
- Detecting Performance Anomalies in External Firmware Deployments β
Netflix
2019
- Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code) β
LinkedIn
2019
- Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video) β
Swedbank
,Hopsworks
2019
- Preventing Abuse Using Unsupervised Learning β
LinkedIn
2020
- The Technology Behind Fighting Harassment on LinkedIn β
LinkedIn
2020
- Uncovering Insurance Fraud Conspiracy with Network Learning (Paper) β
Ant Financial
2020
- How Does Spam Protection Work on Stack Exchange? β
Stack Exchange
2020
- Auto Content Moderation in C2C e-Commerce β
Mercari
2020
- Blocking Slack Invite Spam With Machine Learning β
Slack
2020
- Cloudflare Bot Management: Machine Learning and More β
Cloudflare
2020
- Anomalies in Oil Temperature Variations in a Tunnel Boring Machine β
SENER
2020
- Using Anomaly Detection to Monitor Low-Risk Bank Customers β
Rabobank
2020
- Fighting fraud with Triplet Loss β
OLX Group
2020
- Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative) β
Facebook
2020
- How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4 β
Facebook
2020
Graph π
- Building The LinkedIn Knowledge Graph β
LinkedIn
2016
- Scaling Knowledge Access and Retrieval at Airbnb β
Airbnb
2018
- Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest
2018
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations β
Uber
2019
- AliGraph: A Comprehensive Graph Neural Network Platform (Paper) β
Alibaba
2019
- Contextualizing Airbnb by Building Knowledge Graph β
Airbnb
2019
- Retail Graph β Walmartβs Product Knowledge Graph β
Walmart
2020
- Traffic Prediction with Advanced Graph Neural Networks β
DeepMind
2020
- SimClusters: Community-Based Representations for Recommendations (Paper, Video) β
Twitter
2020
- Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper) β
Alibaba
2021
- Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) β
Alibaba
2021
- JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper) β
JPMorgan Chase
2021
Optimization π
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3) β
Lyft
2016
- The Data and Science behind GrabShare Carpooling (Part 1) β
Grab
2017
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats β
Uber
2018
- Next-Generation Optimization for Dasher Dispatch at DoorDash β
DoorDash
2020
- Optimization of Passengers Waiting Time in Elevators Using Machine Learning β
Thyssen Krupp AG
2020
- Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper) β
Amazon
2020
- Optimizing DoorDashβs Marketing Spend with Machine Learning β
DoorDash
2020
Information Extraction π
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper) β
Rakuten
2013
- Using Machine Learning to Index Text from Billions of Images β
Dropbox
2018
- Extracting Structured Data from Templatic Documents (Paper) β
Google
2020
- AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video) β
Amazon
2020
- One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) β
Alibaba
2020
- Information Extraction from Receipts with Graph Convolutional Networks β
Nanonets
2021
Weak Supervision π
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) β
Google
2019
- Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper) β
Intel
2019
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) β
Apple
2019
- Bootstrapping Conversational Agents with Weak Supervision (Paper) β
IBM
2019
Generation π
- Better Language Models and Their Implications (Paper)
OpenAI
2019
- Image GPT (Paper, Code) β
OpenAI
2019
- Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post) β
OpenAI
2020
- Deep Learned Super Resolution for Feature Film Production (Paper) β
Pixar
2020
- Unit Test Case Generation with Transformers β
Microsoft
2021
Audio π
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google
2020
- The Machine Learning Behind Hum to Search β
Google
2020
Validation and A/B Testing π
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper) β
Google
2010
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper) β
Google
2015
- Twitter Experimentation: Technical Overview β
Twitter
2015
- Itβs All A/Bout Testing: The Netflix Experimentation Platform β
Netflix
2016
- Building Pinterestβs A/B Testing Platform β
Pinterest
2016
- Experimenting to Solve Cramming β
Twitter
2017
- Building an Intelligent Experimentation Platform with Uber Engineering β
Uber
2017
- Scaling Airbnbβs Experimentation Platform β
Airbnb
2017
- Meet Wasabi, an Open Source A/B Testing Platform (Code) β
Intuit
2017
- Analyzing Experiment Outcomes: Beyond Average Treatment Effects β
Uber
2018
- Under the Hood of Uberβs Experimentation Platform β
Uber
2018
- Constrained Bayesian Optimization with Noisy Experiments (Paper) β
Facebook
2018
- Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab β
Grab
2018
- Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code) β
Better
2019
- Detecting Interference: An A/B Test of A/B Tests β
LinkedIn
2019
- Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper) β
Uber
2020
- Enabling 10x More Experiments with Traveloka Experiment Platform β
Traveloka
2020
- Large Scale Experimentation at Stitch Fix (Paper) β
Stitch Fix
2020
- Multi-Armed Bandits and the Stitch Fix Experimentation Platform β
Stitch Fix
2020
- Experimentation with Resource Constraints β
Stitch Fix
2020
- Computational Causal Inference at Netflix (Paper) β
Netflix
2020
- Key Challenges with Quasi Experiments at Netflix β
Netflix
2020
- Making the LinkedIn experimentation engine 20x faster β
LinkedIn
2020
- Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn β
LinkedIn
2020
- How to Use Quasi-experiments and Counterfactuals to Build Great Products β
Shopify
2020
- Improving Experimental Power through Control Using Predictions as Covariate β
DoorDash
2020
- Supporting Rapid Product Iteration with an Experimentation Analysis Platform β
DoorDash
2020
- Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity β
DoorDash
2020
- Leveraging Causal Modeling to Get More Value from Flat Experiment Results β
DoorDash
2020
- Iterating Real-time Assignment Algorithms Through Experimentation β
DoorDash
2020
- Spotifyβs New Experimentation Platform (Part 1) (Part 2) β
Spotify
2020
- Interpreting A/B Test Results: False Positives and Statistical Significance β
Netflix
2021
- Interpreting A/B Test Results: False Negatives and Power β
Netflix
2021
- Running Experiments with Google Adwords for Campaign Optimization β
DoorDash
2021
- The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000% β
DoorDash
2021
- Experimentation Platform at Zalando: Part 1 - Evolution β
Zalando
2021
- Designing Experimentation Guardrails β
Airbnb
2021
- Network Experimentation at Scale(Paper]
Facebook
2021
- Universal Holdout Groups at Disney Streaming β
Disney
2021
Model Management π
- Operationalizing Machine LearningβManaging Provenance from Raw Data to Predictions β
Comcast
2018
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) β
Apple
2019
- Runway - Model Lifecycle Management at Netflix β
Netflix
2020
- Managing ML Models @ Scale - Intuitβs ML Platform β
Intuit
2020
- ML Model Monitoring - 9 Tips From the Trenches β
Nubank
2021
Efficiency π
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) β
Facebook
2020
- How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs β
Roblox
2020
- Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper) β
Uber
2021
Ethics π
- Building Inclusive Products Through A/B Testing (Paper) β
LinkedIn
2020
- LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper) β
LinkedIn
2020
Infra π
- Reengineering Facebook AIβs Deep Learning Platforms for Interoperability β
Facebook
2020
- Elastic Distributed Training with XGBoost on Ray β
Uber
2021
MLOps Platforms π
- Meet Michelangelo: Uberβs Machine Learning Platform β
Uber
2017
- Operationalizing Machine LearningβManaging Provenance from Raw Data to Predictions β
Comcast
2018
- Big Data Machine Learning Platform at Pinterest β
Pinterest
2019
- Core Modeling at Instagram β
Instagram
2019
- Open-Sourcing Metaflow - a Human-Centric Framework for Data Science β
Netflix
2019
- Managing ML Models @ Scale - Intuitβs ML Platform β
Intuit
2020
- Real-time Machine Learning Inference Platform at Zomato β
Zomato
2020
- Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform β
Lyft
2020
- Building Flexible Ensemble ML Models with a Computational Graph β
DoorDash
2021
- LyftLearn: ML Model Training Infrastructure built on Kubernetes β
Lyft
2021
- βYou Donβt Need a Bigger Boatβ: A Full Data Pipeline Built with Open-Source Tools (Paper) β
Coveo
2021
- MLOps at GreenSteam: Shipping Machine Learning β
GreenSteam
2021
- Evolving Redditβs ML Model Deployment and Serving Architecture β
Reddit
2021
Practices π
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper) β
Yoshua Bengio
2012
- Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) β
Google
2014
- Rules of Machine Learning: Best Practices for ML Engineering β
Google
2018
- On Challenges in Machine Learning Model Management β
Amazon
2018
- Machine Learning in Production: The Booking.com Approach β
Booking
2019
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper) β
Booking
2019
- Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank β
Rabobank
2019
- Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper) β
Cambridge
2020
- Reengineering Facebook AIβs Deep Learning Platforms for Interoperability β
Facebook
2020
- The problem with AI developer tools for enterprises β
Databricks
2020
- Continuous Integration and Deployment for Machine Learning Online Serving and Models β
Uber
2021
- Tuning Model Performance β
Uber
2021
- Maintaining Machine Learning Model Accuracy Through Monitoring β
DoorDash
2021
- Building Scalable and Performant Marketing ML Systems at Wayfair β
Wayfair
2021
- Our approach to building transparent and explainable AI systems β
LinkedIn
2021
- 5 Steps for Building Machine Learning Models for Business β
Shopify
2021
Team structure π
- Engineers Shouldnβt Write ETL: A Guide to Building a High Functioning Data Science Department β
Stitch Fix
2016
- Building The Analytics Team At Wish β
Wish
2018
- Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist β
Stitch Fix
2019
- Cultivating Algorithms: How We Grow Data Science at Stitch Fix β
Stitch Fix
- Analytics at Netflix: Who We Are and What We Do β
Netflix
2020
- Building a Data Team at a Mid-stage Startup: A Short Story β
Erikbern
2021
Fails π
- When It Comes to Gorillas, Google Photos Remains Blind β
Google
2010
- 160k+ High School Students Will Graduate Only If a Model Allows Them to β
International Baccalaureate
2020
- An Algorithm That βPredictsβ Criminality Based on a Face Sparks a Furor β
Harrisburg University
2020
- Itβs Hard to Generate Neural Text From GPT-3 About Muslims β
OpenAI
2020
- A British AI Tool to Predict Violent Crime Is Too Flawed to Use β
United Kingdom
2020
- More in awful-ai
References
[1] Machine Learning System Design
[2] Machine Learning Surveys γforked from eugeneyan/ml-surveysγ
[3] Applied Machine Learning γforked from eugeneyan/applied-mlγ