Back to blog home

Pawsome Updates February '23

About DagsHub

DagsHub simplifies the process of building better models and managing unstructured data projects by consolidating data, code, experiments, and models in one place.


Table of Contents
    Share This Article

    Welcome to this month's Pawsome Updates! In this edition, we have some exciting updates to share with you. Let's dive in!

    👨‍🏫 Label Studio webinar

    Learn about Label Studio, an open-source tool for annotation tasks such as image, video, and text labeling. In this webinar, Yono covers how to set up Label Studio, use GitFlow for reproducible data labeling and annotations, and how to automate the labeling and annotation process.

    🎤 Podcast

    In this episode, Dean is speaking with Assaf Pinhasi, ML engineering and MLOps consultant extraordinaire! Assaf was the VP R&D at Zebra Medical Vision, and built the PayPal Risk organization's Big Data Platform. They dived into building ML infrastructure from scratch 10 years ago vs. today, best practices involved in building teams to support machine learning models in production, and the future of generative models.

    🐍 Pycaret integration

    PyCaret is an open-source ML library in Python that automates ML workflows with a low-code, intuitive, and visual approach. It offers extensive models and algorithms, making it a powerful tool for data scientists and ML practitioners, allowing them to perform tasks like data processing, feature engineering, model training, hyperparameter tuning, model interpretation, and deployment without coding.

    The integration between PyCaret and DagsHub enables users to log experiments and artifacts to a remote server, diff experiments and data, and collaborate with others on machine learning projects without making any changes to their code. The integration also allows users to version raw and processed data with DVC and DDA, as well as log experiment metrics, parameters, and trained models with MLflow.

    Read more in the launch post: https://dagshub.com/blog/pycaret-integration/

    ✍️ From our blog

    In his latest blog, Jinen explored the brilliant work on the hard task of retiming high-level features within video! This work is able to retime or even completely remove people, even with occlusions and interactions. Check it out :)

    Read the full blog: https://dagshub.com/blog/layered-neural-rendering/

    💻 Dev Updates

    We have some exciting Dev updates to announce this month:

    arXiv integration

    We introduce a new integration with arXiv. If your project implements a paper from arXiv, you can link between them in the settings.

    Displaying Data from External Buckets

    You can now display data from external buckets in the repository. This means you no longer have to use DVC to view your data on DagsHub!

    Got a large dataset on an S3 bucket? Just connect it to DagsHub using the new “Remote” menu, and it’s there :)

    Displaying content of archive files

    We now support displaying the contents of archive files in the repository!

    Upload folders with DDA

    Direct Data Access (DDA) is a magical component of the DagsHub client library, that lets you stream your data from, and upload it to, any DagsHub project. The files uploaded using DDA are versioned using either git or DVC (uploading directly to S3 bucket is not supported yet).

    And now, you can now upload entire folders to DagsHub using DDA!

    Try it now using the following CLI command (after running pip install dagshub):

    dagshub upload <repo_owner>/<repo_name> <local_folder_path> <path_in_remote>
    

    or the following Python snippet:

    from dagshub.upload import Repo
    
    repo = Repo("<repo_owner>", "<repo_name>")  # Optional: username, password, token, branch
    
    # Upload a single file to a repository in one line
    repo.upload(local_path="<local_folder_path>", remote_path="<path_in_remote>", versioning=”dvc”)  # Optional: versioning, new_branch, commit_message
    

    Thanks for reading this month's Pawsome Updates! We hope you found the latest developments in machine learning and data science interesting and informative. From Label Studio webinars and PyCaret integrations to exciting Dev updates and the latest research in retiming high-level features within video, there's something for everyone to enjoy. Keep an eye out for next month's edition for even more exciting updates!