Welcome to the holiday season! Whether you celebrate the December Solstice, Boxing Day, New Years Day’s or all three, this is the season for having fun and a perfect opportunity to gather as much data as possible. Frequently in discussions with academic leaders the request for access to datasets is a key topic. In the spirit of the season, this entire update will be on just data. The dataset name, a brief description and the URL is provided below for 15 datasets or repositories of datasets.
To spur on ideas of how data visualization impacts delivery, included are a few charts and diagrams to showcase how students may better extract insights from just data once they begin using the analytics resources available including data visualization, cloud assets integrated with big insights tools and other enterprise data analytics resources. For example, the US Census chart below is a status snapshot but a dynamic moving chart on births, deaths and immigration can be seen on the US Census Bureau URL provided below. Does the US have a net gain of one person every 13 seconds, 13 minutes or 13/100th’s of a second? With that cliff-hanger question, let’s begin (the answer is below)…
This site originally came to me on Twitter and in the last month a faculty member shared how valuable he found this site for his work and for use with students. Below is a sample chart from the site to provide an idea on how current the data is as well as demonstrating how relevant the materials are to key interest areas. For a topic that is definitely hot in retail this time of year let’s investigate the consumer sentiment index in the United States over the last year from the Statista site:
IBM ACADEMIC INITIATIVE – UNIVERSITY OF ARKANSAS RETAIL DATASETs (available worldwide to accredited academic institutions faculty and students)
Datasets including Dillards, Sam’s Club, Tyson’s Food, Axiom and others can be accessed with DB2 Connect or SPSS Modeler. The University of Arkansas provides other faculty and student id’s on their system for academics outside of the University of Arkansas community to be able to access and use their system and analytics tools for analyzing datasets in accordance with the rules established by the enterprises providing the real data (the data cannot leave the University of Arkansas system since it is real data from those enterprises).
These datasets for learning and competition across a variety of industry and business challenges.
CITY OF CHICAGO DATA PORTAL DATASETS
The City of Chicago has made over 395 datasets available on topics such as energy, crime, transportation, pot holes reported, cars being towed (remember Winter snow parking rules go into effect 12/1 in Chicago)and more.
US CENSUS BUREAU
US Census data continues to be an valuable source of data for students to identify data specific to where they live for applying the analytics expertise they are learning in the classroom in an environment where they want to have an impact they can potentially see. The below snapshot is a static view of the dynamic tracker that can be found in the above link empowering students for a engaging project using data around life, death and immigration. The answer to the introductory question in this update is in the United States there is a net gain of one person every 13 second as of the writing of this article:
INTERNATIONAL SOCIETY OF SERVICE INNOVATION PROFESSIONALS (ISSIP) DATASETS
While this sites does require a membership which is free, the benefit is then members have access to large, open, datasets for faculty and students who are looking for big data for projects.
PUBLIC DOMAIN MEDICARE DATASETS
“A realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information.”
IBM BLUEMIX INSIGHTS FOR TWITTER
This link provides access to the Search Twitter content from the Twitter Decahose (10% random sample of Tweets) and PowerTrack stream (100% access to Tweets). Here an example of how students can search on their own topics:
WEATHER COMPANY DATA
With the acquisition of the Weather Company Data, IBM is making the data available for general use through the IBM Bluemix Cloud.
BLUEMIX INTERNET OF THINGS SIMULATOR
In addition, to the above Weather Company Data being in the Bluemix Cloud, this is a perfect opportunity to help out on another frequent requirement which is for Internet of Things (IOT) programmable devices. While funding is not available for IOT devices worldwide, the IBM Bluemix IOT Platform does provide a simulated IOT Device for students to use to execute their code.
Data Science, Analytics and Big Data data sets provide resources for the academic community to gain access to data for class projects, challenges and/or an entrepreneurial idea.
US Government datasets: Consumer, Education, Ocean, Finance, Public Safety, Health, Agriculture and some of the categories of the available data with over 192,870 datasets.
AUSTIN TEXAS OPEN DATA PORTAL
This repository has 472 datasets with instructions for how to download or link using Application Program Interfaces(APIs).
OPEN DATA IN THE UK
Datasets including government spending, crime and justice along with business and economy, health and education are some of the categories for the data available. The data.gov.uk site provides a good example of how countries with strict privacy laws are still leveraging open data as a valuable resource to drive innovation in problem solving.
UNITED NATIONS ENVIRONMENTAL DATA LINKS
In working to expand the repositories beyond the US and the UK, this link provides a wealth of data around the globe. The provision of the datalinks and datasets in the range of worldwide languages needed is another key reason this link makes the top 15. Here is an example from the above URL:
IBM ACADEMIC INITIATIVE ONTHEHUB (for faculty and students)
If faculty or students do not have the extended faculty access for using the Bluemix Cloud Resources above, this link is for the IBM Academic Initiative OnTheHub portal to request that additional academic access. This link also provides access for using additional enterprise tools for data warehousing, data visualization, database management, data security and business process management as well as a wealth of other data management resources and more.
The above is a list of some of the key data repositories that faculty, students, entrepreneurs and enterprises may find useful in leveraging the volume of data that is now being made available to take advantage of crowd sourcing and inter-dependencies that are being discovered to drive new ideas and approaches that were not visible before because the data was not freely available. For anyone not buying into the concept that data is now more freely available and being used to solve old problems in new ways or finding new ways to approach newly unearthed problems, let me share one additional key link: http://www.noradsanta.org/
LinkedIn: Valinda Kennedy