Datasets » Biology » Genome Aggregation Database (gnomAD)

Genome Aggregation Database (gnomAD) Dataset for Machine Learning

Install DagsHub:

pip install dagshub

Click on copy button to copy content

To stream this data directly on DagsHub

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/broad-gnomad-dataset")

fs.listdir("s3://gnomad-public-us-east-1")

Click on copy button to copy content

Description

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects. The summary data provided here are released for the benefit of the wider scientific community without restriction on use. The v2 data set (GRCh37) spans 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals. The v3 data set (GRCh38) spans 71,702 genomes, selected as in v2. Sign up for the gnomAD mailing list here.

Explore this dataset on DagsHub

Additional information

Documentation

https://gnomad.broadinstitute.org/about

Update frequency

Data from new releases are made public as soon as they are available. New releases, including both minor and major versions, have historically been issued on the order of once per year.

Managed by

gnomAD Production Team at the Broad Institute

License

MIT; terms of use

Explore this dataset on DagsHub

Genome Aggregation Database (gnomAD) Dataset for Machine Learning

Install DagsHub:

To stream this data directly on DagsHub

Description

Additional information

Documentation

Update frequency

Managed by

License

Related datasets

Allen Brain Observatory – Visual Coding AWS Public Data Set

Allen Cell Imaging Collections

Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset

Cancer Cell Line Encyclopedia (CCLE)

Launch your ML development to new heights with DagsHub

Take control of your multimodal data

ML Newsletter

Genome Aggregation Database (gnomAD) Dataset for Machine Learning

Install DagsHub:

To stream this data directly on DagsHub

Description

Additional information

Documentation

Update frequency

Managed by

License

Tags

Related datasets

Allen Brain Observatory – Visual Coding AWS Public Data Set

Allen Cell Imaging Collections

Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset

Cancer Cell Line Encyclopedia (CCLE)

Launch your ML development to new heights with DagsHub

Take control of your multimodal data

ML Newsletter