Get started with Label Studio and DagsHub¶
This section is all about helping you learn how to use Label Studio and DagsHub Annotations while following the recommended Git Flow. The main goal is to help you gain hands-on experience while having the benefit of following our lead. We'll use the "Where's Elon" project to annotate Elon Musk's images.
We assume you already have a project on DagsHub, with versioned data ready to be annotated. If you don't have a project ready, you can fork the example project instead.
Step 1: Create a Label Studio workspace¶
Navigate to the Annotations tab in your DagsHub repository and create a new workspace.
Note: This process can take 2-3 minutes as DagsHub spins up the Label Studio machine behind the scenes.
Create Label Studio workspace
Create Label Studio workspace
Step 2: Create a Label Studio project¶
In the new Annotation Project menu, choose the tip of a remote branch to associate the project with. It marks the project's starting point and will make all the files hosted on DagsHub Storage, under the selected branch, available for labeling. To work in an isolated environment, we will create a new branch for the labeling project. The default project name is based on the annotator who created it; however, you can change it as you wish.
Create a Label Studio project
Create Label Studio project
Step 3: Choose the files to annotate¶
When launching the project for the first time, you'll need to choose the files to annotate (AKA tasks). You can choose a specific file or an entire directory by checking the box next to its name.
Note: you can annotate files hosted on both Git and DVC remotes. As a rule of thumb:
"if you can see the file - you can annotate it."* S. Lousky
Choose the files to annotate
Choose the files to annotate
Step 4: Configure Label Studio¶
You can configure Label Studio's labeling interface using one of its many great templates. If you need a custom UI , you can create it using basic HTML.
Note: If you choose to work with a template, you'll need to set the project's labels manually.
Configure Label Studio
Configure Label Studio
Step 5: Annotate the data¶
As simple as that, you can start annotating your data. No need to move the data to a different platform, change its structure or synchronize anything. You can start working on the tasks and save the annotations to DagsHub's database.
Annotate the data
Annotate the data
Step 6: Commit changes to Git¶
At any point in time, you can version the state of the project using Git, and commit the changes back to the branch
you chose in step 2 or create a new branch and commit to it. The commit will include the special .labelstudio
directory which is used to manage changes to the labels. You can also add an annotations file in one of the commonly used formats (JSON
, COCO
, CSV
, TSV
, etc.) to the
commit which can be used to train ML models.
Commit changes to Git
Commit changes to Git
Utilizing Git's capabilities, you can now seamlessly iterate over steps 5 and 6, compare the different versions, merge the results, or roll back the changes.
Step 7: Create a pull request¶
When you're satisfied with the labels, meaning they’re accurate and consistent, you can merge them to the main branch. With DagsHub, communicating over the labels is part of the pull request without moving to a 3rd party platform. The reviewer can leave his comments on each label and have the entire process logged and easy to manage. Once completing the task, merging it to the project’s main branch is one click of a button away.
Create a pull request
Create a pull request