When searching for a dataset, first think about who might collect the data you are interested in. For example, it might be gathered by a government agency; a nonprofit or nongovernmental organization or agency; or individual researchers. With this knowledge, you can then check the websites of likely data collectors.
Also, as you read articles and other publications that cite the data, check for how they acquired their dataset. Additionally, you can find datasets available on the web by searching Google, Open Data Repositories, or the USD library holdings. Finally, you can try emailing a request for the data from the authors or researchers.
Find statistics and consumer survey results from over 18,000 sources on over 60,000 topics. Download data sets in a choice of formats.
Explore, analyze, and share quality data. Learn more about data types, creating, and collaborating.
Dryad is a nonprofit membership organization that is committed to making data available for research and educational reuse now and into the future.
A free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. (Includes the data from other Dataverse repositories, e.g. UCSD)
To ensure no one is left behind through lack of access to the necessary tools and resources, Zenodo makes the sharing, curation and publication of data and software a reality for all researchers.
Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.
Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.
ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences.
Data.gov
U.S. government open data
Searchable datasets maintained by Code for Africa, with support from World Bank and AWS. "All the data on openAFRICA is intended to be actionable. The data is meant to help you change the world."
Eurostat
Open data from the EU statistical office
Offers a way to search over 2,000 worldwide data repositories.
Awesome Public Datasets (GitHub)
List of topic-centric public data sources gathered from blogs, answers, and user responses by OMNILab
Amazon Web Services (AWS) datasets: https://registry.opendata.aws/
Bureau of Labor Statistics: https://www.bls.gov/data/
Large Health Data Sets: https://www.ehdp.com/links/datasets.htm
UC Irvine Machine Learning Repository: http://archive.ics.uci.edu/ml/index.php
YouTube-8M Segments Dataset: https://research.google.com/youtube8m/
"Preprints are complete and public drafts of scientific documents, not yet certified by peer review. These documents ensure that the findings of the research community are widely disseminated, priorities of discoveries are established and they invite feedback and discussion to help improve the work.
Certification by peer review is the key distinction between a preprint and an accepted author manuscript or published article. Many preprints are submitted to journals for publication, and as a result, subsequent versions of the paper may also be made available after peer review. Readers of preprints should be aware that any aspect of the research, including the results and conclusions, may change as a result of peer review (see PMC Disclaimer). Authors may also revise preprints and post updated versions to the preprint server." (NIH 2024, "Preprint Pilot")
To view more about preprints available through PubMed, see NIH Preprint Pilot.