All About the Data

Data!  Data is the core of most of our personal and professional decisions.  Well run business relay on data to make decisions and in today’s world that data is key regardless of the size of the business.  Faculty and Students having access to data and being able to leverage data of all different types of data to make data based recommendations is part of demonstrating applying critical thinking based on the data available and the data located.

Today we have more data than ever before and more of it is being made more readily available than ever before.  Approximately one year ago the update was Just Data with links to key datasets.  At that time medical datasets were very limited, even more medical datasets are available.  Second, the visibility of how to get to some of this key data is increasing.  Watching the State of the Union address was available for most around the world with access to the internet. Did you know with that same access you have access to all the State of the Union addresses starting with George Washington’s?  This update covers that and much more.  Let’s begin…

IBM Data Science DataSets

Frequently the question arises, what datasets is IBM making available for use.  The data science experience datasets are all located in one place and cover a range of topics from Breast Cancer, SMS Spam, Forest Fires and more.  This snapshot gives a sample of some of the datasets that anyone can use.

Screen Shot 2018-01-30 at 9.04.06 PM


IBM Institute for Business Value Data: gold or kryptonite?  An insurer’s guide to the resource of the future

The IBM Institute for Business Value Studies continue to be a wealth of value to anyone wanting access to research data summarized around a particular topic from input gather from senior leaders. The additional benefit of these studies are they provide insight into a global viewpoint on the topics addressed. Another reason the Institute for Business Value studies can be useful is they provide an basis for discussing or learning about the topic from an more analytical perspective to drive the dialogue. The added benefit that everyone has access to them for free makes them a benefit to students, professionals, entrepreneurs and enterprises.

Screen Shot 2018-01-30 at 9.45.01 PM

IBM Academic Initiative Data Security Course

A big part of having access to data in an academic setting is creating an environment where the date can be tested and the finding explained in a manner resulting in taking a specific action or at least being informed about the options and actions available.  The IBM Academic Initiative provides courseware content for faculty to be able to take the data and apply it in the classroom.   The below snapshot is an example of one of the Security Fundamentals courses pre-built for faculty to be able to leverage industry expertise to teach the topic.

The IBM Academic Initiative also provides the software to be used for students to get the hands-on experience with industry tools they have been requesting.  Since it is all for free, the IBM Academic Initiative removes cost from being the reason faculty and students cannot quickly move into new skill areas.   Here’s the two-minute video to get started

Screen Shot 2018-01-30 at 9.54.22 PM


IBM Skills Academy Artificial Intelligence Career Path

For academic institutions wanting additional integration with faculty training, curriculum, cloud exercises, testing and badges, the IBM Skills Academy provides that next level of support.   The program was designed to quickly enable established or new faculty to be able to, in a matter of weeks, bring new topics to their academic institutions.  Artificial Intelligence was the most recently added Career Path.   In Module 3 one of the sections is on Computer Vision:

Define what CV is

Know the history and advancement of Computer Vision

Identify some of the tools and services of Computer Vision

Understand Computer Vision components

Define the Vision pipeline.

Learn about the Vision services that are available from IBM Watson.

Create a service and train it to identify images.

The skill duration is approximately 5 hours, including the hands-on labs (lab estimated time: 2 hours).  This provides the complete environment needed for the student to complete the exercise.

Screen Shot 2018-01-31 at 1.35.13 PM

IBM Academic Initiative University of Arkansas for Worldwide Academic Community Use Retail Datasets

The IBM Academic Initiative University of Arkansas makes enterprise datasets available for academic use.  These datasets provide faculty and students the opportunity to work with large real data they might otherwise not have access to use.  Below two examples of companies making their data on the site.  Faculty can request account id’s for themselves and their student to access these datasets as well as datasets from Axiom, Dillard’s Department Store, Hallfux Productions and Nielsen.

Screen Shot 2018-01-30 at 9.14.07 PM


This is one for the first healthcare related datasets that I promoted for everyone to know it was available for all to use. The size of the datasets is massive.  That is the problem and the opportunity.  This provides the opportunity for faculty and student to investigate large scale data and determine what information can they extract from data so large in volume that can lead to observations/recommendation that can quantified in value based on potential savings or anomalies they are able to identify in the data.


XForce Exchange Provides Security Data

As you might know this is one for my favorite repositories on data around cyber security.  To be able to access this data at any time as a guest, as well as enroll for the limited free access makes this valuable for anyone wanting to be part of a professional community and for sharing key data around cyber security incidents. Whether it is an old threat resurfacing or a new threat in its infancy, this exchange is well worth increasing awareness and sharing with those focused in cyber security and for the average citizen of any country.

Screen Shot 2018-01-31 at 12.39.33 PM


GitHub DataSets

GitHub is a source for an amazing range of datasets available to the public.  The categorization and range of topics is why it makes the top 10 list of datasets to know about.   While Kaggle and other are good to have the Github volume that is made available to all of us make is important for students, faculty and enterprises to be aware of what GitHub is providing.  Below is a snapshot of the datasets in the healthcare category:

Screen Shot 2018-01-31 at 12.46.42 PM

Reddit Datasets

To have a comparison data going back to 1790 provides an opportunity to analyze issues and how they have been shared with the American public in this keynote address for over 228 years.  As we dissect speeches made be the highest position of American leadership topics, approaches and guidance can be achieved by learning from other leaders and we determine the US and global impact of our leader’s messages.  With the continuous selling out of Hamilton tickets and copies of the constitutions and American Government classes get renewed interest this dataset is a great find to share with students all citizens to learn and lead.

State of the Union Transcripts (1790 –2018)

This repo contains the transcript from every State of the Union (SOTU) address, from George Washington’s first address on Jan 8, 1790 through Donald Trump’s address on Jan 30, 2018.

The datasets also contains transcripts from addresses that aren’t technically SOTU addresses, but are instead considered “Address Before a Joint Session of the Congress” (see an explanation here).


WordPress Blog:  Just Data

While this update is approximately one year old there are datasets that still stand out.  It still get some of the most use so I am including it here to make it easy to find the link covered in that prior update for datasets not covered above.  With so many new finds, it was time to summarize in this update amazing new datasets being made available and/or datasets that based on current issues or improved accessibility to datasets finding renewed interest.  All prior updated can be found on this blog site if you are looking for one consolidated source:

Screen Shot 2018-01-31 at 5.19.42 PM



With so much data available this gives faculty and students an opportunity to pick an area of interest and see what is available. To see more on what is available along with new programs with IBM’s academic program checked out the changes and updates on: Some assets above and on the site are only available for academic use, many are available for anyone to use. With the resources now available more data is at our fingertips than ever before.

There are two winners now for individuals and enterprises who know how to find and get the most from data: the one who finds the new insight and the ones who benefit from that find (less fraud, more revenues, new cures, etc.). Many gold nuggets reside in the data. The key to finding it is finding the data and sifting through the noise to find the gold. There is indeed gold in all this data. Let’s begin…


Valinda Scarbro Kennedy

LinkedIn:  Valinda Kennedy

Twitter:  @vscarbro


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s