You will learn about data exploration, tracking experiment parameters and metrics, comparing experiments, and more...
Detecting questions about Machine Learning¶
In this tutorial, we'll create a model to predict whether a question on the Cross Validated Stack Exchange concerns Machine Learning or not.
This kind of prediction can be useful if we want to recommend to a user to add the
machine-learning tag to their question for example,
which can make it more likely they will get an answer.
This task is simple and clean enough for a tutorial but leaves room for experimentation with feature engineering, data enrichment, and model selection.
The tutorial is divided into several "levels", each of which demonstrates another workflow improvement. It's designed so that you learn something useful at each "level", even if the level after that is less to your liking, and you choose to stop early.
The levels are:
- Data Exploration - Getting the data and trying to understand it, otherwise known as doing exploratory data analysis.
- Setup - Creating a DAGsHub account and project.
- Data Versioning - Using DVC to keep track of data and model versions.
- Experimentation - Logging hyperparameters and metrics to DAGsHub to keep track of and comparing different experiments.
Delicious statistics 😋 (source: Cross Validated)
Too slow for you?¶
Here is a link to the complete code repo. You can go over it or use the code as you wish.
The tutorial will guide you, step-by-step, to create this repo.