Founder and Editor in Chief of State of Digital Publishing. My vision is to provide digital publishing and media professionals a platform to collaborate and...Read more
There are many different types of sites that provide a wealth of free, freemium and paid data that can help audience developers and journalists with their reporting and storytelling efforts, The team at State of Digital Publishing would like to acknowledge these, as derived from manual searches and recognition from our existing audience.
1. Kaggle
Kaggle’s a site that allows users to discover machine learning while writing and sharing cloud-based code. Relying primarily on the enthusiasm of its sizable community, the site hosts dataset competitions for cash prizes and as a result it has massive amounts of data compiled into it. Whether you’re looking for historical data from the New York Stock Exchange, an overview of candy production trends in the US, or cutting edge code, this site is chockful of information.
2. Wikipedia
It’s impossible to be on the Internet for long without running into a Wikipedia article. With articles that range from fully sourced and references historical biographies to timelines of the near and far future, it’s safe to say that Wikipedia has cemented its status as a free web-based encyclopedia. Between the entry that serves as the general overview of the subject and the many books and online references the site provides, Wikipedia is a writer’s best friend in many respects.
3. Common Crawl
As can be surmised from the name of the website, Common Crawl searches or “crawls” the web for data that it then stores and builds in an open repository that users can access. For two examples of what is possible with this site, virtual patent markers and comprehensive lists of websites offering RSS feeds provide a small sampling of how powerful this application is. If there are data or site comparisons that you want to make, this is an accessible tool for creating original information.
4. EDRM
EDRM, short for Electronic Discovery Reference Model, is a site for legal professionals dedicated to realizing the potential of e-discovery and the rules and expectations surrounding how information is governed. In addition, EDRM members work together to develop collaborative standards, software, and educational tools that are designed to further the community’s goals. To glean information about the ways in which technology can and has been changing the procedural and administrative aspects of legal practice, this is the site you want to visit.
5. Mahout
Mahout focuses on a piece of software by the same name that attempts to figure out the logistics of building an environment that’s capable of creating high-performing machine learning applications that can be scaled and created quickly. For researchers who wish to compile and manipulate their own datasets or try their hand at machine learning applications, this piece of software is especially useful. This site will have individuals well on their way to proficiency with this software.
The Lemur Project is a database that focuses on supporting research on retrieving information and handling human language technologies. With web pages numbering roughly 1 billion and 10 languages collected from January, 2009 to February 2009, the sheer amount of material present and support makes it an excellent resource for researchers. Between all of this and the added support that can be found on the site, anyone who has an interest in technology and human languages will have plenty to work with on this site.
Project Gutenberg is a directory that features public domain novels, papers, and other works. The site’s 54,000+ eBook collection ranges from well-known materials such as the likes of Shakespeare, Mark Twain, and Jane Austen to lesser-known works by more obscure names like Henri Bergson and Samuel Butler. Whether grabbing a classic novel for the sake of being well-read or doing research on how people experienced life in the 19th century, Project Gutenberg is an excellent resource.
This is a website that houses a full dataset containing the audio features and metadata of approximately 1 million popular songs. In addition to the primary million song dataset, there’s also a number datasets that the community has contributed in related categories such as cover songs, genre labels, and lyrics among others. Music historians, hobbyists, or researchers who want this information will be able to sort the data with relative ease. This may very well be the most extensive dataset on this subject matter on the entire Internet.
9. Amazon
Everyone knows Amazon as a digital retailer, but did you know that Amazon also hosts free public datasets that are open for anyone to access without having to either store or download anything on their own devices? With data that spans from weather, space environment, and meteorological information to imagery focused on developing algorithms that aid in computer vision, there’s no shortage of options for those who want a more convenient way to analyze massive amounts of data.
In the interests of promoting more transparency, getting more citizens to engage, and encouraging dialogue, the Government of Canada offers extensive data as part of its Open Government initiative. On this site you can find datasets on government-related issues such as the capacity levels of the homeless shelters in Canada as well as regional numbers on the participation-levels of Anglophones and Francophones in the public sector. With access to datasets of this nature, there’s no need to depend on other people’s statistics to find information.
11. Data Catalogs
Data Catalogs, now Data Portals, offers users a convenient site for browsing open data portals from all over the world. With the portals being assessed and curated by various levels of governments, a number of NGOs, and even the World Bank, the data available for analyzing is extremely high quality. Users have the option of browsing or contributing data portals. From the standpoint of research, the variety of subject matter and information makes this site an especially convenient place to begin a search for information.
Data.gov.uk is a site that allows individuals to find and access data that various public bodies, government departments, local authorities, and government agencies publish. Here researchers can find information on the economic climate for small businesses, trade, imports, industry, and exports or even do research on payments over £25,000 made by government departments. With the site explicitly stating that the data can be used for research, the information covered here may even generate more ideas as researchers go through it.
This site is where the US Government provides open data that the public can have access to in the form of datasets. On top of the raw data, the site also offers a number of tools that can be used to make data visualizations as well as build applications for the web and mobile. Make no mistake. The data is immense with information ranging from credit card complaints to federal student loan program data in over 197,000 datasets. This site offers plenty of opportunities for innovation and comprehensive analysis.
14. DataSF
DataSF offers hundreds of datasets in relation to both the City and County of San Francisco. Interested in seeing what local and regional lobbyists have been pushing for? Do you need statistics on crime? Browse the Showcase tab to see what people have accomplished with the data or use the form to make a contribution. Made with Open Data and offering an academy, a blog, and a number of other tools, this site is driven in large part by collaboration and community. This makes it an asset for researcher.
15. DataFerrett
DataFerrett is different from a lot of sites in that it isn’t a repository or directory so much as a tool that allows users to customize data from local, state, and federal sources through data analysis and extraction. This tool allows users to create customized and comprehensive spreadsheets and then turn the same information into a map or a graph without having to download or enable any other software. Organizing massive data inputs and turning it into something that’s easy to read has never been easier.
16. Inforum
Through the University of Maryland, Inforum makes US economic data available to the public. Many US government agencies have contributed to this site to the point where the site now holds thousands of “economic time series”, as it calls them, and these contain numbers on industrial production, price indices, labor statistics, and business indicators. The data is freely available and can be accessed with either a personal laptop or desktop. Researchers who want to get a good look at the raw, economic data have a resource in Inforum.
17. Europeana
According to the site’s own numbers, Europeana’s collections account for a total of over 50 million records. Using the curated datasets here, researchers can find the information they’re looking for in less time. The datasets here include categories such as 3D models, Italian World War I maps, and even a collection of over 20,000 historic photos from Lithuanian museums among others. For either general historic searches or as a starting point for going through Europeana’s massive records, this is an excellent resource to have.
On top of its non-stop coverage of breaking news and events, the Guardian also has an entire section devoted to data blurbs. The pieces here range from serious topics like the effectiveness of housing policies on homelessness to slightly more light-hearted subjects like which countries have the most Nobel prize winners. Journalists and researchers have no shortage of information to use in their own projects from this site. With the help of a quick search, it’s possible to find data on just about anything.
Hosted by the National Center for Biotechnology Information, the Gene Expression Omnibus is a site that contains “public functional genomics data” that’s compliant with MIAME (Minimum Information About a Microarray Experiment) standards. The site also accepts data that is arrayed or sequenced while providing the tools necessary to find and download the information. Those interested in studying genomes or acquiring information on the subject will have all the data they need here and then some.
Long recognized for its contributions to innovation and progress in the social sciences fields, the University of Chicago’s Center for Spatial Data Science (CSDS) explores the next frontier with its foray into spatial analysis and technology. The work of the CSDS has applications in virtually any field that has to contend with space in approaching the issues. Consequently, fields like environmental economics, public health, and criminology have all benefitted from these applications. The dedication of CSDS to open source software and distribution of its information make the data if provides even more accessible.
21. Konect
Through the use of data collected by the University of Koblenz-Landau’s Institute of Web Science and Technologies, KONECT (Koblenz Network Collection) offers research done in the field of network science and its related subjects. The project uses a series of its own software network analysis tools to crunch numbers and produce drawn plots and algorithms. KONECT then hosts the results of its analytic work directly on the website. With over 200 datasets to choose from, this is a resource that’s worth exploring.
22. MIdata
MIdata is a site that acts as a repository for data that’s supposed to be used by machine learning data. These datasets can range from a compilation of human facial expressions to more scientific topics like predicting how molecules will bond. With entries split into categories that offer access to raw data, tutorials in the material and methods section, as well as learning tasks and challenges, this site allows researchers to parse through the repository for datasets that are of interest.
23. NASDAQ
The NASDAQ is a world-famous stock exchange that has long been an excellent resource for journalists and researchers in search of data from the world of finance and business. Here you’ll find information on IPOs, historic price data, and the breaking financial news that makes this site a go-to online destination for financial data. NASDAQ Composite offers paid data options as well for those who wish to do a deeper analysis. This is a very respected and well-established resource.
24. NASA
Dating back to the moon landing, by now everybody’s heard of this government agency and its forays into outer space. Of interest to journalists, however, is how NASA is also a valuable source data through its Space Science Data Coordinated Archive. Here, researchers are able to find space science mission data in categories such as astrophysics, image resources, and heliophysics among others. In addition, there are also numerous white papers available on the site to go with the new data being submitted.
Socrata is a site that takes the government data that’s available and puts it into a format that makes it easier for people to analyze, click through, and find the information they’re looking for. Designed specifically with the needs of non-technical individuals such as public policy wonks, researchers, entrepreneurs, and concerned citizens in mind, Socrata uses the cloud to compile data from a variety of sources. For journalists trying to understand the effectiveness of different policies, this is useful platform.
26. Quandl
Quandle is a site that offers primarily economic and financial data formatted with the needs of investment professionals in mind. Relying on over 500 information sources from credible organizations like CLS Group, the UN, central banks, and Zacks among others to aggregate its data, this data source is perfect for researchers and journalists who want to get the big picture at a glance. Thanks to the site’s Excel add-in, accessing the data directly has never been easier for users as well.
27. Carnegie Mellon University
Carnegie Mellon University has a well-deserved reputation as an excellent academic institution. What many people don’t know is that Carnegie Mellon’s StatLab is a useful resource for journalists in search of data. This dataset archive includes data on issues such as the MLB salaries of North American players in 1986 as well as data that’s designed for use in evaluating the accuracy of statistics software. In exchange for acknowledgment, these datasets are available for public use.
28. UCI
The UC Irvine Machine Learning Repository, referred to as UCI, is a site that stores a ton of interesting data that journalists can use. Home to 394 daatasets as of this writing, the site has the added advantage of having an interface that’s easy to search. Some of the more popular datasets include information on “Human Activity Recognition Using Smartphones”, wine, and bank marketing among other subjects. In exchange for using all of this data, the site merely asks for a citation.
29. UCR
If you’re a journalist who is looking into the development of machine learning, then the UCR Time Series Classification/Clustering page will make for some excellent reading. The site provides a helpful briefing document that will provide you with all of the background information you need to know. Along with an overview of what the information contains, the site also offers the ability to download the data directly. Just remember to use the citation format the site asks for if you use these datasets.
30. US Census
Need statistics on population wealth? Want to know the exact gender breakdown of a particular field happens to be? The US census is a site that has all of this data and more available for public viewing. Sort data by year or region, and you’ll quickly be able to find the statistics that most people didn’t even know were factored into the US census the way they were. These numbers were available in Excel and Microsoft Word as options which make the data even more accessible for journalists.
31. Wolfram Alpha
Wolfram Alpha is actually a computational engine that allows users to input the data they want to know and receive a calculation. The engine does statistical data and analysis, chemistry, dates and times, and even words and linguistics among other things. For users who are attempting to uncover new ways of handling data, this is especially useful because of how it’s able to just spit out new calculations at the press of a button. Journalists in particular stand to gain a lot by using this as a supplementary resource.
32. Yelp
It turns out that Yelp is more than just restaurants and user business reviews. This user-driven review site also keeps a dataset that gives researchers access to reviews, user data, and businesses for “personal, educational, and academic purposes”. Going by the company’s count, that’s 4.7 million reviews and 156,000 businesses in 12 metropolitan areas included in the dataset. With those numbers, the materials and trends researchers could potentially discover in this data might be a pleasant surprise.
33. Data World
Want to have a list of removed Facebook pages? How does being able to sort US economic data by county sound? Data World is a site that allows people to share, host, collaborate, and keep track of data. The site even includes a section for journalists outlining the reasons why Data World is useful for members of the profession while also pointing out the hosting capabilities to a streamlined FOIA-predictor as well as pages designed to help with organizing. All in all, this is a solid mix of data and data-hosting.
Run and operated by the CIA, the World Factbook gives you information on the societal structures, history, military, and economic situations for 267 countries along with maps, flags, and a set of time zones following the materials in the world map. The site offers a thorough and in-depth look at the subject matter in a way that goes beyond the basics. In short, this is a data source that should be in every journalist’s arsenal.
35. HealthData.gov
Managed by the US Department of Health & Human Services, HealthData.gov offers the public access to “high value health data” in hopes of capturing the attention of entrepreneurs, policy makers, and researchers. In the areas of product and service development at least, people have been able to examine this data and get results. Journalists who want to be on the cutting edge of health data or who are vetting a statement that a health-care official has put out can use this site to find answers.
36. UNICEF
This is a site that lends instant credibility to journalists who use the information it offers. The statistics that UNICEF covers include those relating to issues of health and human rights such as education, maternal health, child poverty, water and sanitation, and child disability among many other categories of statistics that are kept. It’s useful for researchers because it’s up to date and backed by one of the most well-known organizations on the planet. Journalists can’t go wrong citing this data source.
The World Health Organization is an international organization that gathers health statistics and information throughout the world. Aside from the information that can be found directly on the homepage, the site also offers data through the Global Health Observatory. This data includes information on the steps countries are taking towards universal health care, health research and development among other categories. Journalists will find lots of information on outbreaks, health emergencies, and healthcare coverage from an international perspective here.
With the availability of Google Public Data, journalists are clearly able to rely on Google in more ways than one. The search engine juggernaut has public data available and out there for analyzing with over 100 public datasets to its name. Data subject matter ranges from the extremely serious with World Development Indicators and Human Development Indicators all the way to the interesting with data on the most dangerous roads in Europe. All a researcher has to do is run a search and see what Google Public Data has.
39. Gap Minder
Gap Minder offers data on a number of local and national indicators along with links and information on all of the data providers. Using this site researchers can see information such as how old women are when they marry for the first time, statistics on alcohol consumption, and causes of death in children. For journalists who are writing with an international slant or who are doing comparative data, this is an excellent resource. This is a useful source of data regardless.
40. Google Trends
Google Trends is a tool that gives researchers insight into what people are looking for right now at this instant. Researchers can compare the data to the trends that have occurred in the past and can also use the tool to make estimates ahead of, for example the holiday season, to see what will happen for searches in the future. Google trends offers graphs, hot topics and plenty of opportunities to uncover the news before it’s officially news.
41. Google Finance
Google Finance offers a quick and easy opportunity to do a more in-depth search on a company that investors have been raving about. Easy ways to filter technical indicators and review the latest news about the company in one simple, straightforward window that allows you to sort information even more. In addition, it’s free. For journalists who want to research the finances of a traded company, Google Finance offers an intuitive interface with which to access this information. Unfortunately, Google has recently discontinued some of the core features such as finance portfolio. Here are some alternatives to Google Finance.
42. DBpedia
Anyone who’s ever wished for an easier way to run Wikipedia searches has reason to be excited about DBpedia. Powered by the commitment of the community, this site seeks to make it possible to run more sophisticated searches against Wikipedia content. With the English version boasting 4.58 million entries with classifications and associated categories, the site is well on its way to offering comprehensive coverage based off of the information in Wikipedia. Journalists can’t go wrong with this data source.
43. Pew Research
For many, Pew Research is in the upper echelon where surveys, reports, and research data is concerned. The site covers topics that range from political opinions to social trends and developments in various workplace industries. Pew Research also has a search function that makes it easier than ever to access information. Journalists who want up-to-date statistics and findings that come from a source that is trusted and reputable can’t go wrong with turning to Pew Research.
44. Broad Institute
For journalists who want to find out the latest news in relation to cancer, Broad Institute’s datasets could be the perfect place to find the information. This also includes information on additional subjects such as Bioinformatics & Computational Biology as well as brain cancer and molecular pattern discovery. In short, this site gives journalists a leg up in terms of finding in-depth data on cancer to make stories out of the data provided by Broad Institute.
45. UNdata
UNdata offers information on different countries around the world. This includes data such as technical indicators, social indicators, and economic indicators for each country involved. For journalists that are working on human interest stories or stories that could benefit from being substantiated by some additional statistics and data, UNdata is the ideal choice. The accuracy of the data as well as the UN’s reputation make this a data source that journalists can count on while doing research.
46. Google Scholar
Imagine if instead of scrolling through websites, it were possible to pull up a search that had nothing but peer-reviewed papers and academic materials. Google Scholar makes it possible for people to find journal articles, white papers, and publications by the world’s leading scholars. As is usually the case for this company, Google Scholar is as intuitive as it gets with the user merely being required to enter a keyword to get the ball rolling. Searching for academic papers has never been so straightforward.
47. Reddit
Known most commonly as “the front page of the Internet”, Reddit is one of the most popular websites on the Internet. On top of being an accurate gauge of what’s happening online, the site also has a subreddit, or a subforum, that’s devoted to datasets as well. Users are able to request datasets, post resources, and have discussions on working with data through formats like JSON. Researchers stand a gain a lot from perusing this data source.
48. Datamarket
Qlik DataMarket makes it possible for you to collect and handle data from external sources. This platform allows users to borrow across several datasets with the option of cross-referencing it against the data they already possess in order to refine their sense of greater context. Better yet, even though this is a paid platform depending on the subject matter, there’s a free option with the Qlik Datamarket as well. Journalists exploring the data can do so to their heart’s content.
49. Hubspot
Hubspot has always been a thought leader in the who’s who of marketing for business. From the standpoint of doing research, this is a site that will tell researchers everything about what’s going on in the industry as well as what people within the marketing industry are talking about right now in real time. Journalists are able to use this site to learn more about the trends. On that note, Hubspot is a great resource for researchers.
50. Bureau of Justice Stastics
Perhaps unsurprisingly, the Bureau of Justice keeps a ton of statistics. At the Bureau’s website you can find numbers on arrests, inmate deaths, execution by capital punishment, law enforcement statistics, and censuses of the jails. The criminal justice system is a subject of constant fascination for both the public and the people involved with it. That’s what makes the Bureau of Justice’s statistics even more useful for journalists who are doing research into the criminal justice system.
The Uniform Crime Report is a collection of statistics on property crime and violent crime that’s gathered by the FBI. While law enforcement agencies from throughout the US have been reporting this data since 1930, the findings have been published dating back to 1958 can search the . Journalists who are looking to explore the crime data have the option accessing and using the UCR data tool to explore the information that’s available on this site.
Uniform Crime Reporting is the result of a program that was thought up by the International Association of Chiefs of Police in 1929. The numbers gathered by the FBI here are published four times a year. On top of the information provided by the UCR program, the site also includes reports on hate crime statistics, Law Enforcement Officers Killed and Assaulted (LEOKA), as well as the results and numbers provided by the National Incident-Based Reporting System.
53. NACJD
NACJD, or the National Archive of Criminal Justice Data, is a site that draws information from datasets such as the Uniform Crime Reports (UCR) and the National Crime Victimization Survey (NCVS) and then stores and distributes the statistics. Designed to be curated, stored, and maintained for ultimate accessibility, the data comes in several forms including experimental, qualitative, and longitudinal. Ultimately, this offers journalists and other researchers another way to visualize and access criminal justice statistics.
54. First Databank
First Databank is a site that deals with drug data. The site seeks to promote more efficient and more data-driven decision-making in the area of pharmaceuticals. This allows doctors and clinicians to begin thinking about pharmaceutical drugs in different way through the use of First Databank’s innovative use of technology. From a professional standpoint, this site is especially useful because of how its data can help teams adjust as new information comes. At the least, this is a useful resource for journalists writing in the pharmaceutical space.
55. FDA
The FDA, known as the Food and Drug Administration, is the agency that’s responsible for protecting public health through the supervision and approval of drugs, food products, supplements, vaccines, and cosmetics among other consumer products. As a resource, the FDA has datasets available for the public to peruse while also providing technical data for people who are comfortable working with spreadsheets and analyzing the information that comes from the datasets. This is definitely a useful resource for journalists.
56. Drugbase
Ever wondered about exactly how much the country pays in the wake of a drug epidemic? Are there rumors of people consuming drugs differently than before? Drugbase offers a database that’s chockful of statistics on the trends and the usage of drugs in the United States. There are infographics as well as publications on topics like comorbidity of addiction and mental illness or facts on drugged (not drunk) driving. This is a resource that provides enough information to spot trends and make comparisons against past data.
57. UNODC
UNODC, or the United Nations Office on Drugs and Crime, has a website devoted to the furtherance of its goal to help member states adopt stronger standards of research, data collection, and forensic. On this site, researchers can find numerous statistics and publications that cover subjects like data collection, trend analysis, and research programs where possible. A resource that’s full of information on a variety of forensic-related topics as well as the science of the subject.
58. Drug War Facts
Drug War Facts is a site that offers extensive discussion of the war on drugs as well as the consequences of the policy. This includes statistics and numbers on details like comparisons between the cost of treatment as opposed to the cost of relying on law enforcement, on numbers on drug control spending estimates, and a slew of information on just about every topic related to the war on drugs. For many people, this is the most comprehensive site on the web with respect to the war on drugs.
59. National Centre for Education Statistics
The National Centre for Education Statistics, often referred to as NCES for short, is the place to go for any and all education-related statistics. This site has statistics on the state of student lending, projections of education trends, along with datasets and comparison tools that can be used for doing more in-depth analysis. Journalists can use this resource to uncover trends, verify public statements, review the National Centre for Education Statistics’ publications, and find new stories in the data.
60. World Bank
The World Bank hosts numerous statistics and data compiled by the Development Data Group in the financial sector as well as the macro-economics. It’s possible to sort through data by using hashtags. Users can choose between a variety of indicators and make a selection by country in order to review the different measures of developmental progress. As such, this is a resource that anyone looking into the financial and/or economic state of member countries can benefit from having access to.
61. Bureau of Labor Statistics
The Bureau of Labor Statistics is a journalist’s go-to source for numbers and statistics as they relate to current working conditions, what’s happening in the labor market, as well as how prices change and affect the US economy. With the agency’s statistical work dating back to 1884, there’s no shortage of economic data there for researchers to peruse. The site stores the information in a user-friendly interface and constantly updates the data that’s available for searching. This is a data source worth exploring.
62. The Numbers
Blockbuster releases get a lot of media attention, but it’s hard to tell how well a company has actually done without numbers. Enter “The Numbers”. This website offers research and data for the film and entertainment industry. Researchers can explore revenue estimates, expectations for upcoming releases, and other investment data via OpusData’s SQL-based search engine capacity. The Numbers is the first place or researchers to visit for reliable statistics on movies and films. That’s what makes it an excellent resource.
63. Film Forever
Film Forever is a site that researchers can visit for market intelligence and data for the movie industry in the United Kingdom. Here users can find weekly box office numbers for the top 15 UK releases, audience research, reports, case studies, and the organization’s flagship Statistical Yearbook. In addition, the site also has a calendar that keeps viewers informed about when the next statistics will be released. Film Forever’s niche makes it a particularly worthwhile data source.
64. IFPI
IFPI is a site that prides itself on having a finger on the pulse of the worldwide recording industry. Users will find published reports full of insights into recorded music, national and global sales data, as well as reports on the business side of the music industry that show how the companies are investing in music. These reports allow users to see what’s happening. This site will keep researchers up to date on what’s happening in the music industry in real time.
65. Statista
Statista is a search engine like Google, only instead of webpages the site returns data and statistics. With a single push of a button, users can get immediate access to over one million statistics and facts. Users will find infographics, statistics on China, the food industry, consumer markets, and, for a fee, dossiers and industry reports are available for viewing as well. Whether looking for information on the economy, social media, or the Big Mac, this is the place to do it.
66. EPA
The EPA, which is short for the United States Environmental Protection Agency, is the government agency responsible for protecting people and the environment by enforcing the laws that are set up and passed through Congress. On the EPA’s website, users can look through a number of datasets on topics that range from agriculture to subjects as narrow as annual releases on toxic chemicals and waste management methods. This site is an excellent choice for journalists who want access to raw environmental data.
67. Centres for Disease Control and Prevention
This website for the Centres for Disease Control and Prevention bills itself as a “one-stop shop for environmental public health data”. At this site, researchers will find references and lists to data systems that receive national funds while tracking and storing information that relates to concerns of environmental public health. With a focus on programs that operate at a national level and accessibility through direct download capabilities, this is a resource that can be counted on for the latest and most accurate information on the web.
68. National Centers for Environmental Health
Established after the merging of three previously-independent agencies, the National Centers for Environmental Health is the place to go for high quality information on the environment. Offering comprehensive data that ranges from ocean data to ice records from millions of years ago, if the issue involves the environmental, chances are this website will have information on it. The agency’s commitment to accuracy and excellence in its stewardship of one of the largest archives of its kind also make it one of the few sites online that possesses, updates, and maintains this type of data.
The National Oceanic and Atmospheric Administration’s National Weather Service will tell researchers everything they need to know about the weather. This site offers data searches that include information on categories like warnings and forecasts, climate, geographical forecasts and more. In addition, this comes with an intuitive, easy to follow map with tabs that can be clicked on for different results. Whether reviewing what happened locally or finding the forecast for a city in a different state, this site will uncover information quickly.
70. Wunderground
Wunderground is a resource that’s dedicated to making sure that information on the weather is available to everyone around the world with attention also being paid to areas that don’t receive a lot of coverage. Wunderground explicitly states that it has taken steps to ensure that the user experience is excellent on multiple digital platforms. This means that the site is accessible through mobile as well as through PC, making it an ideal resource for journalists who are on the go.
71. Weatherbase
Weatherbase provides information on current conditions, averages, climate information, and travel conditions for over 40,000 cities around the world with the help of a simple search bar. Use the companion site the site links to in order to find additional travel information to the tune of currency converters, coordinates, and county information among other fun facts. Weatherbase can also be used to find places to vacation purely on the basis of what the weather will be like. Happy searching!
72. Energy Atlas
Published under the International Energy Agency, the Energy Atlas presents researchers with the ability to see the world through energy statistics. Originally designed to be a complementary data source from the date of its original inception, the site boasts an animated Sankey flow energy balance as well as several databases to go with the publications that can be perused on the International Energy Agency’s statistics page. Researchers will find both this site and its companion sites extremely useful while researching the ways in which countries and cities use energy.
73. Bureau of Economic Analysis
The Bureau of Economic Analysis, or BEA for short, publishes a broad range of useful information that allows researchers to keep their proverbial fingers on the pulse of the nation’s economy. On this site, there are numbers on US economic accounts that include numbers on consumer spending, GDP, and fixed assets among other useful data. Researchers can search by region or industry as well as by level with international, national, and regional search options. Try the interactive data page to find out more about the bureau.
74. National Bureau of Economic Research
The website of the National Bureau of Economic Research, or NBER, is a data source that approaches economics from an analytical standpoint. It hosts data on a wide range of economic topics with such entries as the Index of African Governance, the Official Business Cycle, Experimental Coincident, Leading and Recession Indexes, and the Macro History Database. NBER has official datasets published and compiled under its own name as well as indexes compiled by other publishers.
75. US Securities and Exchange Commission
The United States Securities and Exchange Commission is an agency that acts as a watchdog of sorts in promoting transparency, fairness, and efficiency in the markets. Interestingly enough, the site has a financial statement dataset dating from January 2009 to October 2017 with updates being made every quarter. Researchers can rely on this site to stay on top of the latest news as it relates to filings and the information it can tell you about companies and the state of their finances.
76. IMF
The International Monetary Fund, also known as the IMF, is a well-established organization in the international economic and financial sector. On the website, researchers can find a host of data on those subjects. Users are able to search datasets by indicator and country and browse the charts and maps while doing research. Popular datasets include direction of trade, primary commodity prices, Financial Soundness Indicators, surveys, and International Finance statistics among other items of valuable information.
Originally conceived by Harvard, the Atlas is an online tool that allows people to visualize and interact with a company’s trade situation. Atlas will then take the information and propose different products that the country could potentially manufacture in order to improve their economy. This is a tool that’s used by policymakers, businesspeople, investors, and engaged members of the public who want to have a better understanding of the economic climate of a given country. Questions of trade and national economies have never been more accessible.
78. Doing Business
Doing Business is the result of an effort to make objective evaluations of business regulations. The site examines nearly 200 economies and numerous cities measuring such details as economic indicators as well as ranking the ease of doing business. This site allows users to examine the effects of various types of business regulations between countries and hosts reports as well as extensive qualitative data. In addition, the site also makes it possible to make comparisons over time.
79. Comtrade
Originally a project of the United Kingdom’s Department for Business, Energy, and Industrial Strategy in conjunction with the Department for International Trade, Comtrade is an excellent resource. Borrowing data from the United Nations’ Comtrade Data, the site provides users with an interactive chart that can be used to search, compare, and analyze the exact numbers of the trade and goods between countries. Just select the reporting country, choose a partner country, and make selections as much as possible.
Global Financial Data is a source that doesn’t just compile standard financial data, it takes financial information dating from the 1200s to now. This information is derived from a variety of sources including books, archived materials, academic journals, and news periodicals. In addition, the site has datasets that utilize the chain linking statistical method. The end result, from the user’s perspective, is a resource that’s like no other on the Internet by virtue of its exclusive data.
Visualizing Economics is less a resource in the data discovery sense of the term and is more of a service that focuses on designing information graphics and interactive dashboards. In addition, Visualizing Economics also does analysis and design for the express purpose of making economic data easier to understand. Through this site, journalists have a legitimate opportunity to work with a professional who has years of experience translating economic data into something more accessible to the general public.
The EU Open Data Portal is a project that was set up in the aftermath of a decision made by the European Commission. On this site, EU institutions are offering data for public viewing and use without copyright restrictions and available with no charge. Datasets include the CORDIS reference data, the transparency register, and even a full list of the people, entities, and groups the EU has issued financial sanctions against. In addition, the data’s available in a number of digital formats.
83. Open Data Network
The Open Data Network is a site that allows users to look up data by region and city. Sporting a clear and intuitive homepage on the site, researchers have the ability to search by data category, city, and even by sample questions. On each page, after going through either the data categories or the sample questions, there are convenient links to even more datasets as well. The organization of data alone makes the Open Data Network a site that’s well worth exploring.
84. Landmatrix
The Landmatrix is a site that offers an online database for land deals with the intention of promoting more transparency on acquisitions. Essentially, this tool can be used to visualize and make sense of the various land deals. The data is always improving, changing, and being adjusted in order to improve the accuracy of the information made available. To date, the Landmatrix has information on over 1,000 deals. It’s a source worth exploring for researchers.
85. United Nations Development Programme
The United Nations Development Programme hosts a lot of useful data on human development around the world for the public to explore. With dates generally spanning from 1990 to 2015 in a lot of these datasets, the indexes include full tables such as trends in the human development index, the gender inequality index, and the life-course gender gap. Researchers can search the data directly through the search bar and also go by country if the intention is to go through the chart.
86. OCED
The OECD, known as the Organisation for Economic Co-operation and Development, has a site that’s focused on aiding governments in anti-poverty initiatives and prosperity through economic stability and growth. On this site, researchers will find peer reviewed materials, publications, as well as standards and arguments in favor of setting standards. The OECD also hosts a factbook that provides a solid economic reference tool to go with a number of surveys and predictions on economic outlook that can be found on its pages.
87. U.S. Department of Health & Human Services
The US Department of Health & Human Services operates a site that provides information on the President’s Council on Fitness, Sports, and Nutrition. With facts and data compiled with the assistance of several experts in related fields such as chefs and athletes. In addition, the site also has a host of statistics. Researchers can find facts on the physical activity of children, the muscle-strengthening habits of adults, as well as information on the dieting habits of the public, as well as obesity among numerous other facts and statistics.
88. Partners in Information Access for the Public Health Workforce
Partners in Information Access for the Public Health Workforce is a project that came about as a result of public health organizations, US government agencies, and libraries specializing in health science. Topic pages on this site include such subjects as grants and funding, health promotion and health education, and literature and guidelines. Through the Public Health Topics section, there’s also data on subjects such as bioterrorism, public health genomics, and dental public health to name a few subjects.
For the last three decades, the United Health Foundation has been providing information on health rankings for use as a means of measuring public health. The site hosts numerous reports and publications that include reports on the health of those who have served, senior reports, women and children’s health, annual reports, and even briefs on important topics to the field. Use the interactive map to explore by region and learn more information. There’s also a search bar for further navigation if researchers are looking for something more specific.
90. Medicare
In the United States, Medicare is the primary means that a lot of people rely on for health insurance and access to medical treatment. Along with the services it offers in real time, Medicare also offers data on standards and quality of treatment across facilities and hospitals via its comparison chart and rule. It’s the official dataset used by the Hospital Compare website and it’s full of data that can be downloaded into Excel for further ease of access.
91. Surveillance, Epidemiology, and End Results (SEER)
Surveillance, Epidemiology and End Results, also known as SEER, has a site that’s especially useful as a source of information on statistics on cancer. It hosts statistical summaries that allow for research on the numbers associated with cancer that can be sorted by the site of the cancer, the ethnicity, race, age, sex, and even by data type. The site also hosts publication, datasets, and software that can be used by researchers for even deeper analysis.
Amnesty International is an organization has long been an advocate for human rights and justice around the world. It also happens to host a lot of data on the status of human rights around the world as well as information on specific atrocities and crimes against humanity at different points as part of its annual report. Researchers can use the information to make comparisons between different years and to see how different countries have evolved or regressed in the area of human rights.
93. Human Rights Data Analysis Group
Since its conception 25 years ago, the Human Rights Data Analysis Group has been applying scientific principles to human rights violations in different countries around the world. The site hosts publications that have been published in reputable media outlets such as the Washington Post and formal publications through Macmillan publishers sorted by year. Along with its organized publications going back years, there were also projects occurring all over the world. For a more technical look at human rights violations, this is a great search.
94. International Relations & Human Rights Data
This site hosts databases compiled by numerous reputable organizations, universities, and even government agencies. Examples of these would be the Manifesto Project, the Minorities (at Risk) Project, the Comparative Welfare States District, and the Armed Conflict Database. There are some projects like the Polity IV Project that go back to the 1800s. Meanwhile, projects like the Stockholm International Peace Research Institute (SIPRI) measure arms transfers, international military spending, and security trends. The best way to appreciate the data would be to head to the site and explore.
95. Uppsala Conflict Data Program
The Uppsala Department of Peace and Conflict Research, often referred to as UCDP, hosts a massive database called the UCDP Conflict Encyclopedia. This is a site that allows users to click through and explore the data the department has already disaggregated. Researchers can be clicked on through the website and also downloaded for further manipulation and analysis. This is a resource that can be counted on and referenced for quality information distributed in an accessible manner.
96. United States Department of Labor
The United States Department of Labor hosts a lot of economic data concerning statistics on unemployment and employment. Naturally, these numbers include databases that include mass layoff statistics, employment projections, job openings and workplace turnover, national employment statistics, and even international labor comparison statistics. The site provides information that’s up to date and accurate while the Department of Labor keeps track of it all. This is a reputable resource with government backing for the purposes of research.
97. Small Business Administration
The Small Business Administration has long been a proven resource for entrepreneurs and other aspiring entrepreneurs. This site hosts a ton of statistics on employment as well as information that allows researchers to do market research and competitive analysis. Here researchers can find numbers, statistics, and tools that can be used to uncover additional data. For information on small business statistics from an employer and business perspective, this is an excellent resource that journalists can turn to at any time.
98. Crowdpac
Crowdpac is a platform that allows political candidates to fundraise and organize. Drawing heavily from the idea that there are a number of congressional candidates each election that basically run unopposed, this site allows engaged citizens to organize support. With articles discussing relevant political issues like gerrymandering to go with additional topics like civil rights and national security, this site represents an excellent opportunity to understand and find out what’s happening in the grassroots political scene.
99. Gallup
This site is home of the famed Gallup polls. Gallup specializes in analytics that allow organizational decision-makers to solve problems through a data-based approach to problem-solving. Furthermore, the device suggested by Gallup is often useful for driving solutions. This is a source that has recognition as the gold standard in data and advanced analytics. Just browse the site to explore reports on everything from the state of the global workplace to discussion of US productivity.
100. Berkeley Library
Berkeley Library hosts a full compilation of statistics and data for Political Science research on its site. On this page researchers will find a ton of links that provide researchers with access to a number of datasets as well as the capacity to build your own. Among these are the Historical Statistics of the United States (HSUS), the Millennial Edition, the Data Planet, ProQuest Statistical Insight, and the Inter-university Consortium for Political and Social Research. There’s several hours of data to get through.
101. RAND State Services
For those who don’t know, the RAND Corporation is an organization that specializes in research into public policy challenges. With clients and a portfolio that spans all levels of government, the corporation is a source of quality research for the purposes of decision-making. The US branch of the corporation hosts a set of database statistics on its website. Here researchers can find information on K-12 education, health, business, and economics among other categories that address issues that are relevant to the public good.
102. Roper Center for Public Opinion Research
Run and operated through Cornell University, the Roper Center for Public Opinion Research specializes in collecting, distributing, and preserving public opinion data. As an example of the sort of information the Roper Center can uncover, researchers can have access to data from the US election as well as a link to an archive of over 23,000 datasets. Whether journalists are looking for public reaction to politics or to a recent health scare, this site is almost certain to have information.
103. Transportation Gov
Powered and operated by the Bureau of Transportation, this site has data that spans a broad range of transportation-related subjects. Resources on this site include reports on energy, passenger safety, energy, system performance, transportation economics, infrastructure and freight transport. Users can even sort and access the data offered on this site by location and geography. This is a site that allows researchers to find out everything they could possibly expect to know about transportation-related topics.
104. Travel Trade
Travel Trade is a site that hosts data concerning US citizen departures dating from 1996 to 2016 as of this writing. The stated goal of this resource is to help interested members of the general public process and understand how global tourism and international tourism has operated over the years. Available both for download as well as for online viewing, this is an accessible piece of information. Researchers can easily use this data to find trends and make comparisons.
105. Skift
Skift is a site that focuses on providing intelligence and data to the travel industry. Among other sources of data, the company hosts research, conferences, and informative newsletters for subscribers and researchers to choose from. Skift examines topics that individuals in the travel sector would want to know such as where people are increasingly travelling, identification of new markets, and a lot of additional information on travel technology that researchers have the option of exploring.
106. Geoba.se
Geoba.se is the perfect site for people who want the facts and nothing but the facts about a city or location. Using the search engine on the homepage, finding coordinates, information for travel, weather, and even local webcam footage is just a few simple keystrokes away. The site also hosts a page that provides information on world rankings that can be narrowed down by region and country. In short, this is a resource that’ll provide pure data and statistics.
107. US Travel
US Travel hosts a site that’s operated and maintained by the US Department of State. The stated mission of the site is that it aims to protect the lives of US citizens who are going abroad. As such, this source hosts statistics, information, and reports on such topics as US passports, US visas, intercountry adoptions, deaths overseas, and international parent child abductions. The information can be used while planning trips but can also be used to identify long-term trends with the statistics spanning from 1996 to 2016.
108. UK Data Service
Financed by the Economic and Social Research Council, UK Data Service collection publishes a broad range of data. This site has information that includes materials like business data to cross-national surveys, surveys sponsored by the UK government, and even UK census data. Basically, the website was designed with the needs of students and researchers in mind. In addition, there are guides, resources, and instructionals that offer guides and resources that’ll help researchers understand and use the tools on this site quickly.
109. Data.gov.au
Run and published by the Australian government, Data.gov.au offers easy access and searching of open data. This site explicitly points out that the government data can be used to develop tools and applications that in turn can be used for the benefit of Australians. Not only is there access provided to the open datasets, but there’s also unpublished data that can be accessed for a fee. For researchers who want to perform an even deeper analysis, the site also offers a Data Toolkit.
110. Twitter
Everybody knows Twitter for its fast-paced conversations, short messages, and its status in popular culture as a hub for breaking news. What a lot of people don’t know, however, is that Twitter also has developer tools that make it easier to filter and discover information. These tools even allow researchers to view trends and filter by geography. Whether reading up on trending hashtags or exploring the developer tools, Twitter is a resource journalists have been using for quite some time.
111. Instagram
Instagram isn’t purely for liking cute cat pics and adorable baby photos. Or at least, it doesn’t have to be. The app has a surprisingly sophisticated set of developer tools that make it easy to understand and do research on the audience. In addition, hashtags and the clues revealed by the photos people post as well as the individuals who get tagged in them can be treasure trove of information. Instagram is a useful way to uncover what’s trending in different sectors.
112. Four Square
For the type of research where location matters, Four Square is a useful data source because of its massive database and all of the information that it has compiled. On the surface, it has a city guide that provides recommendations for users on the strength of the community. Four Square also has developer tools that allow for additional information access through the Places Database. Journalists can use this to learn more about specific locations and about the people who use the app.
113. New York Times
Considered by many to be an esteemed member of the Fourth Estate, there are very few journalists who haven’t heard of the New York Times. What’s often overlooked, however, is the use of the New York Times as a data source through its API. Researchers can find articles dating back to 1851 by month, search articles, and even find book reviews. This API allows for searching based on views, shares, and emails and even for finding and accessing comments.
114. AP
The Associated Press has a permanent place in popular culture as a source of timely and accurate news. Thanks to its developer tools, it’s also a useful source of data for journalists. As of this writing, researchers can use these tools to create their own editing while downloading pictures and videos. The level of content appears to depend on the type of plan researchers are using, but the Associated Press API nonetheless allows users to take the research process to another level.
115. Five Thirty Eight
Journalists may already be familiar with Nate Silver and Five Thirty Eight and his statistical model due to his sometimes unexpected but usually correct predictions. Five Thirty Eight has a GitHub that hosts datasets as well as coding that has been used over the course of the site’s history. The datasets feature amusing subjects like data on bad drivers, the Avengers, and the survey on flying etiquette. At the same there are also files that address slightly more serious matters like airline safety and hate crimes.
116. IMDb
IMDb is considered by many to be the most comprehensive site on the web with respect to the film and acting industry. If there’s a movie coming out and people want to know who’s acting in it or to see the general reaction of the movie-going public, chances are they’re going to land on this site at some point during their search. IMDb also hosts a number of datasets that are refreshed every day and are available for commercial and non-commercial use.
117. KAPSARC
KAPSARC is a data portal that hosts a total of 923 datasets with specific information on energy data. These sets are divided into a few general themes in energy use, energy supply, and other relevant factors like policies, demographics, the environment, trade, water, ad economic information. For researchers who are interested in energy and how it’s used across different industries and sectors, KAPSARC is one of the most comprehensive energy data sources on the web.
118. Asset Macro
Asset Marco is a site that provides historical financial data and macroeconomic indicators. This data covers more than 75,000 stocks, currencies, commodities, and bonds spanning the world over. In addition, the site has more than 120,000 macroeconomic indicators users can use to explore the financial data of different countries. In addition to all of this financial market data, the site also discusses investment strategies. This source is very unique because of the sheer volume of information that can be found.
119. US Government Web Services and XML Data Sources
The US Government Web Services and XML Data Sources are hosted on a site called USGovXML.com. Here, users can browse through the different XML data sources and web services that the US government has provided. This simple act of preservation keeps those web sources transparent and accessible to the public. For researchers who are regularly monitoring this index in general, it’s possible to find a story in the data in the event that there’s a sudden change to the XML data.
120. Figshare
Figshare is a site that hosts over 5,000 pieces of scientific content available for academic research and citation. On top of the information there, the site is designed to offer researchers a single location for the purposes of compiling, uploading, storing, and managing the research that they find. Mathematics, health sciences, engineering, chemistry, biological sciences, and social sciences all listed as featured categories. This site is a great source for journalists in search of more academic resources to site.
121. LinkedData
LinkedData is a site that’s dedicated to the idea of finding new ways to connect Internet data that wasn’t linked before. Here, users will find tutorials, guides, and data sets that will get the story going. The datasets all focus on the topic of getting involved with the linked data community, and besides the linked data shopping list, most are categorized as dereferencable URIs either with or without the complementary RDF format. To learn more about this community, this site is a must-see.
122. The Web Miner
The Web Miner is the perfect place for researchers who want to collect all the generic data they can find with the program. This site hosts example databases such as US restaurants, SWIFT codes from banks around the world, US gas stations, American tourist attractions, and Google Play apps among other massive lists. If nothing else, it’s a site that’ll make it easier and faster for journalists to sift through and uncover massive amounts of data in significantly less time.
123. Data Hub
Data Hub prides itself on being a place where users can find and publish data as quickly and efficiently as possible. The site itself hosts a number of data sets. The House Price Index (Case-Shiller), the monthly price of gold, and the Current Trends in Atmospheric Carbon Dioxide are the three most popular. In addition to the data, the site also hosts a number of tutorials that users can go through in order to learn more about navigating the various types of data available.
124. Enigma Public
On its site, Enigma Public dubs itself as “the broadest collection of public data” available on the web. The datasets fall into one of four broad categories in FOIA, Essentials, Newsworthy, and Under the Radar. Some of the data on this site includes White House employee salaries and Active Federal Firearm Licenses. After making a free account, users are able to access any one of the categories of data that are there for the viewing.
125. Yahoo
Most web users are familiar with the name Yahoo due to the likes of Yahoo! News and Yahoo! Finance among the company’s many online properties. Of interest to researchers and journalists, is the fact that Yahoo also hosts a vast number of datasets including Yahoo! Music User Ratings of Songs with Artist, Album, and Genre Meta Information, v. 1.0 and the Yahoo! Movies User Ratings and Descriptive Content Information, v.1.0 to name two. Journalists in search of new statistics can’t go wrong with this source.
126. 1000 Genomes
1000 Genomes is home to a project of the same name that went from 2008 to 2015. The purpose of the project was to find every genetic variation that could occur in at least 1% of the populations of being studied. Along with the publications that came about because of this project, there were also massive datasets that included separate databases of variant cells, raw sequence files, and sample availability. This data can be either browsed or downloaded.
127. CBOE
CBOE is a futures exchange that focuses primarily on volatility futures. In particular, the site features plenty of materials concerning the futures that are featured on the site’s trademarked Volatility Index. The site hosts market data of all sorts including historical data, daily market statistics, and VX Futures Daily Settlement Prices. For journalists who are seeking quality market data, CBOE is a site that can provide that information in a format that’s easy to follow and understand.
128. St. Louis Fed
The Federal Reserve Bank of St. Louis is one of, if not the most, important financial centers in its region. On the website, researchers can peruse working papers, economic data, publications, and information services directly. In other words, there’s no shortage of information on the current and past thinking of the St. Louis Fed in terms of policy as well as the ability to evaluate the effectiveness of the St. Louis Federal Reserve Bank. For business, finance, and economic journalists, this is a top-notch resource for information.
129. OANDA
OANDA is a popular online stock trading platform, primarily trading in CFDs and the Foreign Exchange. On top of the many features added to the trader with the intention of attracting online traders, OANDA also hosts a lot of historical rates data as well as historical information on the currency converter on the site. Along with all of this data, the site also offers information with investment strategies along with news and market analysis. An account isn’t even necessary for accessing most of this data.
130. ABS
The Australian Bureau of Statistics, or ABS, not unlike its American counterpart, offers objective data, economic information, and research on a broad range of topics that are relevant to the country. Directly on the site itself, researchers can look up statistical data on business indicators, health care, housing, finance, International Trade, housing, mental health, as well as price indexes and inflation. Journalists can run searches to find older surveys and information that can also sort information by region.
131. London Database
Originally conceived and operated by the Greater London Authority, the London Database is London’s attempt to make London’s data more accessible to the public. The end goal is to give people access to this information while encouraging them to use it for free in whatever way they want. On this website, users can search data by topics such as Arts & Culture, Crime and Community Safety, Education, and Health. Journalists who are interested in this type of data can now get it directly from the local government.
132. Stats NZ
The government of New Zealand hosts a ton of statistics and data for researchers to dig into and analyze on this site. This information can be sifted through using the search bar at the top, by filtering for location and region, as well as by topic. Some of the topics include economic indicators, health, income and work, industry sectors, environment, and business. Between the additional news sources and releases highlighting various findings and statistics, journalists will uncover all sorts of New Zealand-specific statistics through this site.
133. Australian Government Bureau of Meterology
Run and operated ultimately by the Government of Australia, the Australian Bureau of Meterology’s website features weather information as it pertains to the various cities and regions of Australia. Per the site, this agency was established as a means of helping Australians cope with the climate around them through a combination of warnings and advice. Here researchers will be able to find seasonal outlooks, water storage, rainfall forecasts, climate variability, and seasonal streamflow forecasts. At this site you will find accurate and reputable coverage on Australian weather.
134. GroupLens
This site is on the web courtesy of GroupLens of the University of Minnesota’s Department of Computer Science and Engineering. The site offers publications as well as datasets for research purposes. There’s a total of about six datasets. Among the named sets, there would be a few entitled the Book-Crossing, MovieLens, and HetRec 2011. In short, this is a useful resource for journalists who are seeking to better understand how to use the data provided.
135. KD Nuggets
KD Nuggets is a site that focuses primarily on providing people with data science, business analytics, machine learning, and data mining. There’s a page on the site that has a complete list of datasets that people use to do more exploration of data mining and big data with datasets like Bioassay Data, Asset Marco, DataMarket, Casualty Workbench, Data Ferrett, and Datamob all being linked to. This is a fantastic resource for journalists who prefer having all the information on one page.
137. Microsoft
Everybody who’s used a PC or a laptop has probably heard of Microsoft at least in passing. Interestingly enough, on top of PCs, laptops, and software, Microsoft also hosts a lot of research and publications. This includes breakthroughs such as the company’s quest to create literate machines as well as cloud-based data science. There’s also additional information on tools Microsoft is developing like Visual Studio Code Tools and the developments in AI that they represent.
138. RDataMining
Exactly like it says on the tin, R Datamining is a resource on R and datamining. The site provides numerous example and documents that give an in-depth perspective on data mining and data mining with R. In addition, there are also links to training courses such as the short course offered by the University of Canberra. This includes links to free datasets and presentations as well as datasets that cover subjects like airplane, airline, and route data as well as links to site like GeoDa.
139. Collaborative Research in Computational Neuroscience – Data sharing
Collaborative Research in Computational Neuroscience, also known as CRCN, has a number of datasets that can be accessed through their site. The datasets are categorized by the various parts of the brain such as the visual cortex, the hippocampus, the motor cortex, avian, eye movements, and aplysia as just a few examples. These folders also include challenges, tools, simulations, and methods. The ability to share this data makes it an even better resource to use for research.
140. Protein Data Bank archive
Per its website, Protein Data Bank archive has been a premiere resource on nucleic acids, the 3d structures of proteins, and complex assemblies since 1971. Formed with the explicit mission of keeping this information in the public domain, researchers can go here to view validation reports and data dictionaries online. There are also data growths and usage statistics available for web-based sorting and analysis as well as for download. Best of all, the site is always adding new information.
141. The PubChem Project
PubChem as an official project was designed for the purposes of informing the public about what small molecules are able to do from a biological standpoint. The site is linked to by three databases including PubChem Compound, PubChem Substance, and PubChem BioAssay. In addition, the site also makes it possible to search for the similarities between different proteins. For researchers taking their data analysis to the next level, the site also offers free coding and tips.
142. Coremine Medical
Coremine Medical is an invaluable resource for anyone searching for information on biology, health, and medicine. Now that the biomedical text mining capability of PubGene has been rolled into its current form, Coremine is also one of the most flexible sources of biomedical information around. This site will display links between concepts and ideas in a visually engaging, easily understood format that may not have been noticed otherwise. It’s easily one of the most comprehensive biomedical data sources available to journalists.
143. Tu Tiempo
Tu Tiempo is an incredible source of weather and climate data for every country in the world. Using this resource, it’s easy to find annual, monthly, and daily averages for virtually every city and region in the world. In addition, users can also search through the database of over 115 million records full of historical data that any person can search through. Depending on the region being searched, it’s possible to find data that goes as far back as 1929.
144. Complex Network Resources
This is a site that provides access to quite a bit of the data that was first used in its computer-based experiments. The full list of datasets that list the types of data including news graphs, biological graphs, citation graphs, collaboration graphs, engineered graphs, and semantic graphs. The page also links to a list of sources that contain a lot of information such as the dataset that examined roughly 3 million US patents. The page also boasts an impressive compilation of Complex Network datasets.
145. Scopus
Scopus is a tool that allows individuals to quickly and easily find research and academic citations. The site offers an incredibly extensive database of research that has occurred around the world in a number of fields that include sectors such as medicine, technology, social sciences, and the arts and the humanities. Use Scopus to capture academic source that might’ve been overlooked. After all, in many circles, the quality of an academic source can be almost as important as the information it provides.
146. Stanford
Stanford’s reputation as a prestigious academic institution didn’t happen randomly out of the blue. The excellence shows through in its programming-related courses. The site also hosts a number of datasets that include details such as social network information. There are datasets centering on the social circles on Facebook, Wikipedia admin request, Twitter social circles, and Google +. Communication networks and the Amazon Product Network also have their own datasets.
147. University of Milano
The University of Milano’s Department of Information Sciences runs and operates a web page known as the Laboratory for Web Algorithms. This site is home to plenty of datasets that are there for the exploring. These include graphs in relation to social networks, Facebook graphs, snapshots from the DELIS project, and a short list of miscellaneous data. The information available here can be viewed online and downloaded if so chosen, making this one of the most accessible datasets of its kind on the web.
148. UCI Network Data Repository
The UCI Network Data Repository is a site that’s dedicated to taking a scientific approach to the study of networks. On the resources page, researchers will find links to dataset directories selected by research organizations and groups as well as by individuals. It also has a collection of datasets that would typically be used for social media analysis. Those digging into the data will be pleased to find that these sets are also available for download.
149. CAIDA
CAIDA, or the Center for Applied Internet Data Analysis, collects a wide range of data from a number of different locations, often with the assistance of different organizations and individuals. There are datasets hosted on this site like AS Relationships, DDOS Attacks, Telescope and its related ones along with other data. The categories include traffic, topology, security, worm summary, and traffic summary statistics. Datasets may require request access, but many, if not most, are public.
150. Crawdad
Crawdad, or the Community Resource for Archiving Wireless Data At Dartmouth, is unique because of its focus on providing wireless data to researchers and others who may have an interest in the subject. The site offers a number of tools as well as access to numerous datasets. Among the sets listed are those referred to Educational Use, Bit Error Characterization, Network Diagnosis, Opportunistic Connectivity, Location-Aware Computing, and more to select. Researchers will appreciate this resource the more they dive into it.
151. U.S. Energy Information Administration
Often referred to as the EIA, the US Energy Information Administration is in the business of providing annual electricity utility data to the public. The information in this data covers fossil fuel stocks, fuel consumption, monthly and annual information on the generating of electricity, and environmental data among other options. The data is there and available for analysis dating from the years 2001 to 2017. All researchers have to do is navigate onto the site and download the information.
152. British Oceanographic Data
Funded by the National Environment Research Council, British Oceanographic Data is one of the most accessible sources of marine data on the Internet. With an extensive database that touches on currents, CTD profiles, international sea level data, currents, and even historical bottom pressure recorder data. In addition, there are datasets to be found in the Published Data Library which offers additional access to the catalogue. This is quite possibly one of the most extensive sources of marine information available online.
153. Factual
Factual provides location data for advertising and for use on mobile platforms. Of particular interest to researchers are the developer tools that include the Engine Mobile SDK and the full professional and research applications of the Observation Graph as well as the Local Validation Stack. With a website moniker that emphasizes the company’s passion for takin data around the world and finding new ways to put it in context, Factual has a clear commitment to data and finding new and unorthodox opportunities to use it.
154. Global Administrative Areas
Global Administrative Areas is a geodatabase that shows where the various administrative areas in the world are situated. The data gathered from this type of database is then typically used in geographic information systems. These would include countries and is further divided into provinces, counties, and departments among others. The good news for journalists is that all of this data is available for free and can also be used for academic and general non-commercial use.
155. Geonames
Geonames is a site that’s home to a geographical database with millions of entries, unique features, and alternative names. Offering both an export option and access through a variety of web services, this is a database that processes approximately 150 million requests each day. Thanks to the database’s wiki capabilities, users are able to make adjustments and changes to the database entries with relative ease. This is a great resource for the multi-language hosting capabilities alone.
156. Natural Earth Data
Natural Earth Data is a map dataset that’s available in the public domain and full of information designed for use in map-making software for the creation of state of the art maps. The visuals of the final product are neat and well-organized and the data can be used immediately. This dataset includes the presence of intelligence data and various cultural, raster, and physical vector data themes. Originally made with the needs and preferences of cartographers in mind, this dataset is useful to anyone with an interest in geography.
157. Openstreet Map
Openstreet Map is less a website and more a collaboration between users that is now providing mapping services to apps, sites, and various hardware devices. This site acquires new data when users enter information on lesser-known landmarks such as railway stations, roads, and trails. The full dataset is available free of charge on the site and can be downloaded either in full or in part. For those opting to do a partial download of the data, it’s possible to download by region as well.
158. City of Chicago
The City of Chicago is the home of Michael Jordan’s championship Bulls and its own unique style of pizza, and it also has a full data portal of its own. Dataset categories span a variety of topics that include Administration & Finance, Ethics, Health & Human Services, Parks & Recreation, Public Safety, and Historic Preservation. In short, the City of Chicago’s data portal hosts virtually anything that would be of interest to researchers, policymakers, and local journalists.
159. CKAN
CKAN is essentially the online home of the City of Glasgow’s open data project. This site has datasets on numerous subjects that are useful for entrepreneurs, policymakers, academic researchers, and app developers to utilize. Out of the 360 datasets hosted here, some are related to city governance like the house stock by tenure dataset while others like the cycling dataset are of particular interest to local residents. There’s all sorts of information here for journalists who are covering a more local beat.
160. Government of India
The Government of India has a website that covers analytics and data resources in its version of the Open Data Project. Currently, there are roughly 137,940 resources that have been viewed millions of times on the site. There vast majority of these files are also available for downloading on the site. Whether looking for numbers on the government budget or searching for datasets that address health and family welfare, chances are this site will have resources to offer.
161. Stats SA
This site is full of up-to-date statistics, publications, and data gathered by the South African government. Here researchers will uncover information on everything from food and beverage surveys to economic indicators, employment statistics, population numbers and important health statistics. It’s possible to search the numbers by city, theme, and indicator depending on what’s needed. This site hosts a lot information on the census while also releasing statistical publications, questionnaires, codes and classifications, and pricing policy.
162. Policy Development and Research
This site is published under the umbrella of the U.S. Department of Housing and Development’s Office of Policy Development & Research. It publishes a large number of case studies, bi-annual publications, and periodicals regularly each year. It also offers a large number of datasets that journalists would be interested in with Fair Market rents, Income Limits, Renewal Funding Inflation Factors being just a few of the sets the public has access to on this site.
163. Vital Net Health Data
At Vital Net Health Data, researchers will find plenty of large health-related datasets. This site is not so much hosting all of these sets so much as offering links to sets that people can visit and find information through. This curated list links to resources like CDC Wonder, Eurocat, Health Data All Star, and also the work of charitable organizations such as the North American Association of Central Cancer Registries. This is hands down one of the most comprehensive health dataset resources out there.
164. Analytic Bridge
Analytic Bridge is a resource that’s dedicated to business intelligence. Here researchers will find discussion on machine learning and AI, links to webinars and conferences, and even a job search tab. The site also hosts Data Science Central, which is the part of the site that focuses on big data. With its active and engaged community and its commitment to providing news and information, journalists with an interest in the implications of data for business stand to gain a lot from this.
165. Archive.org
Known primarily for its efforts to become an online public library, archive.org is home to numerous published works as well as a substantial dataset collection. The site boasts results from the 2012 Internet Census as well as Dark Net Market archives from 2011 to 2015, and even a dataset of public Reddit comments. There are data dumps from Music Brainz and a dataset that contains audio cover images. Between its publications and data, archive.org has plenty of material for journalists to go through.
166. Academic Torrents
This website refers to itself as a system designed for making it easier to share and download huge datasets. Making use of torrent technology to simplify the distribution of data, Academic Torrents prides of itself on allowing researchers to download everything they need quickly. The site also hosts papers, courses, and collection for viewing. A quick search through the resources available will reveal that there are tons of datasets and collections available for downloading here.
167. Dataverse
The best way to approach Dataverse is to think of it like another type of library. Here, researchers can search for, discover, and cite data with ease while simultaneously using this site as a repository for their own information. The subject matter covered includes fields such as the social sciences, the agricultural sciences, medicine, health, and life sciences, as well as the earth and environmental sciences. Big names with publications on this site include Gallup and the US Department of Commerce, Bureau of Census, Geography Division.
168. UC DATA
Operating in conjunction with UC Berkeley’s Social Science Data Lab, UC Data is the university’s biggest and most well-known archive. This site provides offerings in the areas of statistics and social science data. On this site researchers can access the papers, reports, and working papers produced by the UC Data researchers. The raw data covers numerous research areas that include Health Care, Welfare and Social Insurance, Demographics, Voting, and Information Technology among a host of other topics.
169. Joke Camp
Joe Kamp offers a full guide to finding soccer and football data and APIs for the purposes of data analysis. If researchers follow the links provided on the page, there’s open source data available through GitHub as well as access to free and commercial APIs for the purposes of easier access. Since the data and coding is available on a well-recognized site like GitHub, getting a hold of this sort of data has never been easier.
170. Sean Lahman
Sean Laham isn’t necessarily a name people are hearing every day, but his site is home to one of the most comprehensive and in-depth batting and pitching statistics on the Internet. With numbers covering the period from 1871 to 2016, the data literally goes back centuries. Data is free to access and use under the Creative Commons Share Alike 3.0 license and can be downloaded directly in SQL and Microsoft Access to name a few. The statistics can also be downloaded via GitHub.
171. Retro Sheet
Retro Sheet is one of the most extensive sources on the Internet for baseball statistics and data. The site includes details like annual rosters and identification of umpires, players and coaches. For the years that it was relevant, the data for the all-star game was included in the event files along with a set of event files for the post-season and a small discrepancy file. Retro Sheet even has identifications for ball parks for each season. How’s that for thorough?
For those who aren’t as familiar with the program, the Hubway is the name of the bike-share based in the metropolitan area of Boston. Of course, the system didn’t record and release identifying information, but the Hubway nonetheless has the basic information on every trip that was ever taken between July 2011 and September 2012. This included details like the start and end of the trip as well as the pick-up station to name a few categories.
173. Open Flights
Open Flights is a database that has information on more than 10,000 ferry terminals, airports, and train stations around the world. Researchers can find the Excel-compatible, .csv version through GitHub and can also download the data directly on the website as well. Using the map on the homepage, it’s possible to see which specific places are on the list and the site even goes so far as to have route information available as well. The site owners can be contacted for even more updated information.
174. MLVIS
MLVIS is a data repository that combines visual analytics with data mining in real time. This makes it possible to explore more intuitive understandings of data even while working with huge datasets. Benchmark data and non-relational machine data learning along with different data types such as attributed and heterogeneous are among the many features and options available through this site. For the added convenience of users, this information can also be downloaded into a single consistent format.
175. Open Data Inception
Open Data Inception is a site that offers links to well over 2600 data portals. By making use of the search bar on top, researchers can search for portals and datasets by category and by theme. In addition, it’s also possible to use the site as a means of finding the most up-to-date version of the dataset being searched for. Take advantage of the ability to view data portals in list format or in interactive visual form and start finding the necessary data.
176. OpenDataSoft
Available in French, English, and German, OpenDataSoft is a source that offers access to 480 million records, 4 million API cells, and 9,284 datasets. Using the search bar in the middle of the homepage, researchers can enter a keyword or category and find the most appropriate dataset from there. For journalists, this is a faster way to find the most relevant datasets needed to complete the research in question. Visit the site to learn more.
177. Nationmaster
NationMaster is a source of fully compiled data from over 300 countries that has been organized in over 5,000 categories. The data covers numbers that include numbers on the percentage of deaths that have been registered, World War 2 statistics, and even information on nuclear war and testing. Researchers will also find tables, graphs, and pie charts that will allow for further visualization of the data. Put simply, there are so many subjects covered that there’s always something new to find in the data.
178. Followerwonk
Twitter has long been a popular social media site for breaking news and finding trending stories. Followerwonk allows users to take their Twitter usage to the next level. This includes finding Twitter users to connect with, studying current followers, and planning Twitter activity for maximum results. These days there are a lot of reporters and journalists on Twitter who are using the site for networking and getting stories out there. Followerwonk makes Twitter users more productive on the site.
179. Infochimps
Infochimps is a site that offers cloud-based services that can be scaled back for the purposes of getting the most out of big data. It’s useful when it comes to deploying and integrating big data technology and applications. When researchers are searching through massive amounts of data or evaluating trends in big data, this is an invaluable resource to have. There are also numerous white papers and cases available for researchers to view on the site.
180. Archived national government statistics
Founded in 2006, Archive-It is a service provided by the Internet Archive. This service helps organizations and businesses create digital collections and as a result it has had opportunities to work with non-profits, colleges, universities, and governments. Researchers can search a few of the different archives on the site such as websites from the 2014 congressional candidate race, the Alabama State Archives, and the Canadian Government Information PLN Web Archive. This site is a treasure trove of information for enterprising journalists.
181. Civic Commons
Civic Commons has a page that lists the various government open data initiatives. This searchable list of resources is organized by country, city, region, and even makes mention of the resources made available by intergovernmental organizations. For journalists, this site represents a faster way to find out which governments are participating in the Open Data Project. This site also grants access to pieces of localized data that wouldn’t necessarily come up in a simple Google search.
182. Guardian World Governments
The Guardian is a famous name in the world of journalism for its reputation for breaking news. What less people realize is that the site has a section that offers data on and about governments around the world. There are articles on the impact of homelessness numbers, discussion on cyber-security, and even thoughtful discussion on the role that data and statistics have to play in the current political and social climate. The Guardian’s World Government section is capable of jumpstarting discussion and finding angles for stories.
183. Open Government Data (Hub)
This site belongs to a group via the Open Knowledge Foundation with the goal of encouraging and supporting the continued development of open government data. Here, users will discover links to one of the most extensive lists of open data catalogues available. Among the additional goals mentioned on the site, the group also seeks to find information on policy, best practices, and guidelines as well. It provides journalists with extensive access to more and better information.
This website is the online home of the open data project offered by the Government of France. It’s possible to dig into the data by searching under categories such as employment, agriculture, education, travel and tourism. This is data that allows for building and developing a more nuanced understanding of what the data actually says while also leaving room for comparisons based on the historical information. Basically, journalists have every reason to be excited about going through this data.
This site stores the research data available through the University of Notre Dame’s use of SourceForge.net. The data is offered through relational databases. The monthly data dumps also make it possible to gain a better understanding of open source software and its applications. In order to access this information, requests for access must be made in writing over email. The catch, however, is that scholarly and academic researchers are the only ones eligible for access to the data.
186. UFO Reports
The National UFO Reporting Center has an online database detailing people’s experiences with unidentified flying objects. Researchers can streamline their database search by using any of four categories in the date, the shape of the UFO, the posted date, and even by state. UFOs are unique because they never fail to capture the imagination of the public. If there have been any recent encounters of the third kind happening nearby, this is the place to find out what people have been saying.
187. WikiLeaks
Notorious and infamous in media due to the controversies and what the leaks have revealed about the inner workings of government and other famous and powerful figures in society, WikiLeaks has a reputation that precedes it. Although the data dumps are rarely ever dropped quietly, nobody ever questions the accuracy of the information. For journalists in search of stories that will instantly draw interests, WikiLeaks is a proven source. If nothing else, it’ll make for interesting reading.
188. The Washington Post
The paper is already known as an excellent source of breaking news and opinion pieces, but few people know that the Washington Post grants access to the raw data that’s often mentioned in its articles. On the data page, researchers can find data in categories such as education, the census, health and safety, transportation and development, historical World Cup databases, and even numbers pertaining to government and politics. Put simply, having access to these numbers helps people develop a more concrete understanding of the issues in the news.
189. Climate Data
Climate Data is a dataset that provides comprehensive information on global temperature. In the current format, users can see every important piece of climate information through the grids while also being able to see what the averages are. For those searching for the companion data, it’s possible to get access to the same information for land and ocean as well. This information can be downloaded, but for the sake of convenience, it can also be viewed directly on the site as well.
190. Protein Structure
Protein Structure is a source that seeks to examine how computer networks can be used in conjunction with biology. The page hosts a repository with data that can be accessed through the links provided. Of particular interest for members of the research community is how the site incorporates several ideas like model analysis and executable biology into its pursuit of this goal. For journalists, this site is well worth looking at to observe progress and examine data.
191. Analyze Survey Data for Free
With the help of this site, users can take a course in analyzing survey data without having to pay for the privilege. Analyze Survey Data Free with its detailed Table of Contents, includes sections sporting titles like Maps and Art of Survey – Weighted Maintenance, Balancing Respondent Confidentiality with Variance Estimation Precision, Structural Equation Models (SEM), and Complex Survey Data. The site offers a great refresher for those who anticipate handling more statistical data in the future.
192. UCLA
At the UCLA wiki site, researchers will find a number of datasets available for the purposes of demonstration. There’s plenty of simulated and observed data to choose from. Using these resources it’s possible for people to use this resource to uncover climate data, population data, biomedical data, neuroimaging data, US census data, election data, and economic data among numerous other categories. Ultimately, these datasets are a resource that a lot of people can benefit from using.
On its site page, the University of Toronto offers researchers access to what it calls the Delve Datasets. These collections of data were part of a larger product designed for the purpose of making comparisons between the learning methods. Ultimately, this information is there for the development and evaluation of the different approaches to learning. In short, this is a solid source for researchers who want to better understand how to analyze and handle datasets.
194. Natural Resources Conservation Service
The Natural Resources Conservation Service has a site that concentrates on promoting conservation while offering information on the different mosses, hornworts, vascular plants, lichens, and liverworts present within the United States. This site hosts a full database of plants and images of plants that can be found on the site to go with tons of information. Researchers can download the database and find tons of information on topics such as alternative crops. Essentially, this website has everything folks need to know about plants.
195. Agricultural Research Service
As can be surmised from the name of the agency, this service handles the research needs for the US Department of Agriculture. Whenever an agricultural problem is discovered, this is the part of the government that most likely helped find a solution. The site hosts a number of datasets that can be accessed and downloaded directly. Journalists can also use this site to find all the latest news in relation to the issues affecting agriculture.
196. Cell Image Library
This site offers a public library that offers resources, information, and access to images and animations portraying cells and cellular processes. The cell is designed with the dual process of research and education in mind, the information here is almost always relevant during discussions of public health and disease. The materials come from a combination of sources including historical and modern publications. For a thorough explanation that simplifies com