Pawsome Updates February '23
- Ido Nov
- 3 min read
- 3 years ago

Welcome to this month's Pawsome Updates! In this edition, we have some exciting updates to share with you. Let's dive in!
👨🏫 Label Studio webinar
Learn about Label Studio, an open-source tool for annotation tasks such as image, video, and text labeling. In this webinar, Yono covers how to set up Label Studio, use GitFlow for reproducible data labeling and annotations, and how to automate the labeling and annotation process.
🎤 Podcast
In this episode, Dean is speaking with Assaf Pinhasi, ML engineering and MLOps consultant extraordinaire! Assaf was the VP R&D at Zebra Medical Vision, and built the PayPal Risk organization's Big Data Platform. They dived into building ML infrastructure from scratch 10 years ago vs. today, best practices involved in building teams to support machine learning models in production, and the future of generative models.
🐍 Pycaret integration
PyCaret is an open-source ML library in Python that automates ML workflows with a low-code, intuitive, and visual approach. It offers extensive models and algorithms, making it a powerful tool for data scientists and ML practitioners, allowing them to perform tasks like data processing, feature engineering, model training, hyperparameter tuning, model interpretation, and deployment without coding.
The integration between PyCaret and DagsHub enables users to log experiments and artifacts to a remote server, diff experiments and data, and collaborate with others on machine learning projects without making any changes to their code. The integration also allows users to version raw and processed data with DVC and DDA, as well as log experiment metrics, parameters, and trained models with MLflow.
Read more in the launch post: https://dagshub.com/blog/pycaret-integration/
✍️ From our blog
In his latest blog, Jinen explored the brilliant work on the hard task of retiming high-level features within video! This work is able to retime or even completely remove people, even with occlusions and interactions. Check it out :)

Read the full blog: https://dagshub.com/blog/layered-neural-rendering/
💻 Dev Updates
We have some exciting Dev updates to announce this month:
arXiv integration
We introduce a new integration with arXiv. If your project implements a paper from arXiv, you can link between them in the settings.

Displaying Data from External Buckets
You can now display data from external buckets in the repository. This means you no longer have to use DVC to view your data on DagsHub!

Got a large dataset on an S3 bucket? Just connect it to DagsHub using the new “Remote” menu, and it’s there :)

Displaying content of archive files
We now support displaying the contents of archive files in the repository!

Upload folders with DDA
Direct Data Access (DDA) is a magical component of the DagsHub client library, that lets you stream your data from, and upload it to, any DagsHub project. The files uploaded using DDA are versioned using either git or DVC (uploading directly to S3 bucket is not supported yet).
And now, you can now upload entire folders to DagsHub using DDA!
Try it now using the following CLI command (after running pip install dagshub
):
dagshub upload <repo_owner>/<repo_name> <local_folder_path> <path_in_remote>
or the following Python snippet:
from dagshub.upload import Repo
repo = Repo("<repo_owner>", "<repo_name>") # Optional: username, password, token, branch
# Upload a single file to a repository in one line
repo.upload(local_path="<local_folder_path>", remote_path="<path_in_remote>", versioning=”dvc”) # Optional: versioning, new_branch, commit_message
Thanks for reading this month's Pawsome Updates! We hope you found the latest developments in machine learning and data science interesting and informative. From Label Studio webinars and PyCaret integrations to exciting Dev updates and the latest research in retiming high-level features within video, there's something for everyone to enjoy. Keep an eye out for next month's edition for even more exciting updates!