Free Data Samples for Data Visualisation: A Guide to Accessing Public Datasets

The pursuit of compelling data visualisations begins with the foundation of the data itself. For individuals and organisations in the UK seeking to practice their skills, build projects, or gain insights, a wealth of free, publicly available datasets exists. These resources provide the raw material necessary to create meaningful and impactful visual representations. This guide outlines the methods for obtaining such data and highlights key sources and considerations, based exclusively on the information provided in the accompanying source materials.

Methods for Obtaining Data for Visualisation

Creating effective visualisations requires relevant and reliable data. Several established methods are available for sourcing this information, each with its own protocols and ethical considerations.

Open Data Portals Open data portals are websites that provide free access to a wide range of datasets collected by government agencies, research organisations, and other institutions. These portals offer datasets on diverse topics, including demographics, economy, healthcare, and the environment. Examples cited include data.gov, data.world, and the World Bank's Open Data. Users can search and download datasets from these platforms, but must ensure compliance with any specified licensing or attribution requirements.

Web Scraping Web scraping involves the extraction of data from websites using specialised tools. The source material mentions tools such as Beautiful Soup (Python) and import.io for this purpose. However, it is crucial to be mindful of a website's terms of service and to ensure that no legal or ethical boundaries are violated. Respecting website owners' policies and avoiding excessive server requests is essential.

Public APIs Many organisations provide Application Programming Interfaces (APIs) that allow developers to access and retrieve data programmatically. These APIs often supply structured and up-to-date data. Popular examples given include the Twitter API, Google Maps API, and GitHub API.

Understanding Datasets for Visualisation

In the context of data visualisation, a dataset is a structured and organised collection of data that serves as the foundation for creating visual representations. It is this organised set of information that can be analysed and visualised to derive insights and communicate meaningful patterns or relationships. A common example provided is the "Iris" dataset, which consists of measurements of four features (sepal length, sepal width, petal length, and petal width) of three different species of Iris flowers.

It is possible to combine multiple datasets for analysis or visualisation to provide a broader perspective and enable more comprehensive work. However, it is important to ensure compatibility and consistency between the datasets being merged.

Key Sources for Free Datasets

A variety of organisations and platforms offer high-quality, free datasets suitable for data visualisation projects, machine learning, data analytics, and more. The selection of a dataset should depend on the specific goals, context, and domain, with careful consideration given to data quality, relevance, and ethics.

Specialised Data Platforms Certain platforms curate datasets specifically for data visualisation practice. These cover a range of topics such as health, social impact, climate, and government. The data is often exceptionally clean and comes with context from published articles. Examples include airline safety data, US weather historical data, and study drug usage patterns. Each dataset may connect to an article, providing a model for presenting findings. Another resource, Our World in Data, provides research and data on global challenges like poverty, disease, and climate change. Their datasets come with ready-made visualisations for study and improvement, covering topics like literacy rates, economic progress, and health outcomes. The NASA Earth and Space Data repository offers extensive free public datasets on both Earth science and space exploration, with data ranging from satellite imagery to climate measurements. The Maven Analytics Data Playground offers free, unique, real-world datasets designed to test data visualisation and analytical thinking skills, from coffee shop sales to shark attacks.

Data for Analytics and Research For business analysts and data analytics professionals, certain sources provide datasets that support operational insights and business intelligence work. Quandl (Nasdaq Data Link) specialises in financial and economic datasets, offering both free and premium data covering real estate, economic indicators, and financial markets. The platform is valuable for time series analysis and financial modeling, with data available in multiple formats and accessible via API. The Pew Research Center conducts extensive surveys on politics, social issues, and media, releasing datasets publicly for secondary analysis after an embargo period. Topics include US politics, journalism and media, internet and tech, and religion. These datasets are excellent for understanding survey methodology and social science research. The Bureau of Labor Statistics (BLS) provides economic data including unemployment rates, inflation, wages, and productivity, with most data filterable by time and geography. This is essential for economic analysis and understanding labour market trends.

Community and Curated Lists Some platforms rely on user submissions, resulting in a wide scope of data. The material notes that submissions are user-driven, but unique data not found elsewhere can be discovered. Notable examples include complete Reddit submission history, Jeopardy questions, and NYC property tax data. Sorting by top posts of all time can help identify the most valuable contributions. Furthermore, curated lists exist that compile spectacular free datasets for various projects, including data visualisation, machine learning, data processing, data cleaning, data analytics, government and demographic analysis, academic and research projects, and personal data analysis.

Considerations for Dataset Selection

Choosing the right dataset is crucial for creating impactful visualisations. Demographic data, such as census data and population growth, help uncover patterns and trends in population dynamics. Economic data, including GDP and employment rates, can identify economic patterns and business opportunities. Environmental data, like climate change and pollution levels, contribute to scientific research and policy formulation.

When selecting a dataset, several factors should be considered: - Clean and well-documented data saves time. Datasets with clear column headers, data dictionaries, and minimal missing values are preferable. A messy dataset might be good practice for data cleaning skills, but it may not be ideal for a first project before focusing on the primary objective. - Appropriate size and complexity matter. It is advisable to start with datasets that have enough rows to be interesting (typically 1,000+ records) but will not overwhelm the system. As confidence builds, one can scale up to larger datasets. - Interesting questions drive engagement. The best datasets let you explore multiple angles and tell compelling stories.

Conclusion

The landscape of free data for visualisation is rich and varied, offering resources from government portals and specialised research institutes to community-driven platforms. By utilising methods such as accessing open data portals, employing web scraping (with due regard for legal and ethical guidelines), and leveraging public APIs, UK consumers, students, and professionals can find the data needed for their projects. Careful selection based on cleanliness, size, and relevance, alongside an understanding of the ethical and licensing considerations, is fundamental to creating meaningful and impactful data visualisations.

Sources

  1. Datasets for Data Visualization
  2. Free Datasets for Projects
  3. Maven Analytics Data Playground

Related Posts