DATASURFING ON THE WORLD WIDE WEB - Part II

Robin H. Lock
Department of Mathematics, Computer Science, and Statistics
St. Lawrence University
Canton, NY 13617
rlock@stlawu.edu

Outline for a talk at the 2016 Joint Statistical Meetings


ABSTRACT: This is a continuation of a presentation from the last JSM in Chicago (1996). At that time we looked at web sources for students and instructors to obtain real data for use in projects and class examples. What’s changed in this regard over the past 20 years? Where are some places to go now to get easy access to useful data? What new challenges have emerged for obtaining data from the ever expanding web? .


CATEGORIES OF DATA SOURCES

  • Dataset Archives with Teaching Suport
  • Pages of Data Links
  • Government Sources
  • R Packages
  • Data from Visualizations
  • More Data for Countries
  • Survey Repositories
  • Fun and Games
  • Data Scraping

  • Dataset Archives with Teaching Support

  • Journal of Statistics Education Data Archive More than 100 datasets and documentation contributed by statistics teachers for classroom use. At least 80 of these datasets are tied to longer JSE articles discussing their use in statistics classes. Jenny Baglivo has made a quick summary of some of her favorites from this collection.
  • DASL - Dataset and Stroy Library A collection of datasets and related documentation (stories) which may be searched by data subjects and/or statistical techniques. Thanks to Paul Velleman and DataDesk for taking over hosting of this project.
  • ICSPR Data-Driven Learning Guides 50+ topics linked to political and social research survey data.
  • TSHS Resources Portal A new collection of resources started by the ASA's Section on Teaching Statistics in the Health Sciences. A limited number of datasets at this point, but they are just getting started and have good support for using the data in class.

  • Pages of Data Links

  • Winner's Miscellaneous Datasets Lots of links (data and documentation) maintained by Larry Winner at Univ. of Florida, organized by statistical technique.
  • Awesome Public Datasets A very large list of links to public data organized by subject area (Sammy Chen). May take some digging to get to actual data.
  • Big Data: 33 Brilliant And Free Data Sources For 2016 Article by Bernard Marr in Forbes. An earlier list with 20 sources is at The Big Data Guru

  • Government Sources

  • Data.gov "The home of the U.S. Government's open data." Searchable links to hundreds of thousands of datasets. Try "College Scorecard" to get a click away from downloading a .csv file with infomration on almost a hundred varaibles for more than 7000 colleges and universities.
  • Canada Open Data Portal Similar site with searchable links for Canadian data.
  • Canada Open Data Portal List of open data portals from around the world organized by country

  • R Packages

  • Rdatasets A collection of data sets from various R packages (e.g. datasets, car, Ecdat, MASS, HistData, survival, ...) mantained by Vincent Arel-Bundock. Current list has 758 datasets from more than 30 R packages with links to the data as .csv files and documentation (without neeeding R). Find a link to the R script for doing this at Rdatasets Github page

    Several R packages with good data for teaching (requires R to get the data) include ...

  • Mosaic A collection of data sets from the Mosaic package developed by Randall Prium, Daniel Kaplan, and Nicholas Horton.
  • Lock5Data Datasets from the textbook "Statistic: Unlocking the Power of Data" by Lock^5 (Wiley), datasets also availabe at lock5stat.com
  • Stat2Data Datasets from the textbook "Stat2: Models for a World of Data" by Cannon, et al. (Freeman).

  • Data from Visualizations

  • Gapminder Country Data Download data on countries that drives the neat interactive displays at Hans Rosling's Gapminder World

  • More Data for Countries

  • World Bank Open Data Search by individual countries, general categories, or specific indicators.
  • CIA Factbook Lots of country level data, but trickier to get it in dowloadable format. Look for "Country Comparisons". Variables there have a "Dowload Data" link, but countries are ordered by that particular variable.

  • Survey Repositories

  • ICPSR Inter-university Consortium for Political and SOcial Resarch .
  • CIA Factbook Lots of country level data, but trickier to get it in dowloadable format. Look for "Country Comparisons". Variables there have a "Dowload Data" link, but countries are ordered by that particular variable.

  • Funs and Games

  • World Bank Open Data Search by individual countries, general categories, or specific indicators.

  • Data Scraping

  • IMBD TV Episode Ratings Search by individual countries, general categories, or specific indicators.