Thor or A Great Data Scientist: Who Makes the Biggest Impact!

In today’s world data science has become a hot topic for many industries.   Since data is the new oil,  the men and women on the ground are key to pulling that valuable resource out of the depths and getting value from it.  Data Science is in demand as indicated by the recent position of data science as the top job for 2016 in Glassdoor’s recent ranking’s,  “Glassdoor’s Best 25 Jobs List Ranks Data Scientist No. 1”


This July update will focus on data science from an industry perspective, resources for data scientists, applications of data science, datasets and additional academic and learning assets.    Included in this update is an excerpt from an article written by Justin Megahan on his discussion with Brad Schumitsch.  As a Fulbright School with a PhD in Machine learning from Stanford and extensive experience in industry, Brad is a credible source for defining a data scientist and how they differ from others in similar disciplines.   With so much to cover, let’s begin…


Centre for Applied Insights 

Breakthrough experiments in data science:  Practical lessons for building your organization’s capability

How does a business get the most value from taking advantage of data science?  Businesses are getting value by integrating the data science function into the core metrics used by the business.  This report gives examples of practical implementations by businesses for the highest return.  Why are businesses so interested in data science?  This article has a great page summarizing the reasons:  increased business, growing top line revenue while keep costs down and reaching new customers (page 5).  In addition, the report also discusses how to build the powerful teams needed to deliver these valuable outcomes.  Here is a summary visual of the capabilities required in this type of team and the article explains each circle and the role each person in the main circle plays:

Centre for Applied Insights Pic


Data Science Central

In subscribing to the Data Science Central Newsletter the content is timely and consistent on delivering new content for this fast moving discipline.  This update is unique in that it provides a central location for various experts across industries.  The site includes blogs, videos, events and job information.  The site is well done and subscribing to the newsletter provides a good summary of new assets and discussions that makes this newsletter and the overall website a good resource to get the latest viewpoints and input from current and emerging experts.  Here is an example of this weeks update from my mail:






Here’s our selection for today. We will continue to post articles from highly respected data scientists in the coming weeks.

·         7 Traps to Avoid Being Fooled by Statistical Randomness – By Kirk Borne

·         How You Can Improve Customer Experience With Fast Data Analytics – By Ronald van Loon

·         15 Astonishing Tweetable Facts About Analytics – By Bernard Marr

·         Making data science accessible – Neural Networks – By Dan Kellett

·         Solving problems with DataScience for Internet of Things – By Ajit Jaokar

·         Need for DYNAMICAL ML: Bayesian exact recursive estimation – By PG Madhavan

·         8 Types of Data – By Bob Hayes

·         A Statistician’s View on Big Data and Data Science – By Diego Kuonen

·         Learning R in Seven Simple Steps – By Martijn Theuwissen

·         Statistics is Dead – Long Live Data Science – By Lee Baker

Enjoy the reading!

Upcoming DSC Webinars and Resources

·         Multi-genre Advanced Analytics to Optimize your Hadoop Data Lake

·         Open Data Platform Demystified: Hadoop and Spark





Twitter:  Difference Between Stats and Data Science by Justin Megahan

This is an interesting discussion on the difference between Stats and Data Science.  Justin Megahan does a great job in this article on researching the content readily available and then he also works hard to find an expert who provides a perspective on why so many think they understand what data science is,  why it matters and the difference between Statisticians and Data Scientists.   Justin finds a key credible person to talk about the similarities and differences between a Statistician and a Data Scientist in Brad Schumitsch,  as started earlier,  the Fulbright School with a PhD in machine learning from Stanford.  While his background provides a solid foundation for his viewpoint on the topic it is his business experience that makes Brad’s dialogue on this topic rational, logical and the cornerstone of why so many in business are seeking the data scientists of today.



MIT Technology Review:  10 Breakthrough Technologies 2016

The MIT 10 Breakthrough Technologies update is a fascinating read.  Reading about all 10 Breakthrough Technologies provides insight into several technologies that will be the norm in a matter of a few short years–in some case maybe even a few short quarters.  As I use voice commands in my car,  recognizing the number of times it does not work but it is still vastly improved over the last car,  the advances in conversational interface as highlighted by the 10 breakthrough technologies is one that is changing almost daily.    See my tweet from last week on my discussion, along with IBM interns, talking with the IBM Robot Miquel  That is why of the 10 Breakthrough Technologies,  Conversational Interface is the one I highlighted, since many of us may use some form of it regularly today:

Conversational Interfaces

 Watson Conversation Pic
IBM Bluemix Cloud:  Watson Conversation

Sounds great but how can we all make our apps or services talk?  “How about adding a natural language interface to your application to automate interactions with your end users. Common applications include virtual agents and chat bots that can integrate and communicate on any channel or device. Train Watson Conversation service through an easy-to-use web application, designed so you can quickly build natural conversation flows between your apps and users, and deploy scalable, cost effective solutions.”

What to leverage speech,  include the Text to Speech and Speech to Text Services in the Bluemix Cloud:

In the IBM Robot Miquel twitter reference above the exchange with Miquel is done using these bluemix services:  Dialog, Text to Speech and Speech to Text.   Now it will be even easier with the above IBM Bluemix Watson Conversation Service announced in July.   If you want to learn more about the Watson Service, consider attending the World of Watson at Mandalay Bay, Las Vegas | October 24­—27, 2016


Data Sets

While data science continues to expand for those looking to find the golden nugget, one key to making this work effort deliver results is for data scientists,  and those planning to become data scientists,  to have access to data.  In recent years data sets have become more readily available.  Here are links to datasets that can make for an interesting playground for discovery:

Medicare/Medicad Datasets:

US Census Bureau Datasets:

World Bank Data:

University of Arkansas IBM Academic Initiative Retail Datasets (academic use only):

Quora List of Datasets Open to Public:

Insights with Twitter Data:

Weather Company Data:

Weather Data Pic


IBM Academic Initiative Extended Cloud Access for Academic

While there is a clear demand in the market for data scientists and there are volumes of data available historically unprecedented,  one additional link required for the academic community is access to industry tools for students to get hands on experience with the data using those resources. The IBM Academic Initiative provides those resources for faculty to provide that access to the students in their classes.   The IBM Academic Initiative provides at no-charge access for faculty for 12 months (renewable) and the provides them codes for the students in their classes for 6 months of access to the cloud to use the above IBM resources.


Learning Lab

Newly announced:   IBM Learning Lab for developers.  The content includes courses and use cases to give developers the opportunity to see potential uses for the volume of cloud services made available to developers.  With 72 courses and 35 use cases this is a new education hub worth checking out.  Here is an example of one of the 35 use cases:


Build a medical Q&A system

Create a SOLR collection, convert documents and populate the collection with sample data

  • This is part one of a three part use case
  • Build a custom query builder optimized for natural language search
  • Create a set of algorithms to score semantic relationships between a given query and a Solr document
  • Use IBM Watson Document Conversion to format content


With all these assets, data and experts ready at the helm to explore bold new worlds, the field of data science is ready for those wanting to take this challenge.  As demonstrated by the wealth of resources available data science is changing how individuals, entrepreneurs, enterprises and the academic community are leveraging data to learn and do more with data science.  In the true nature of data science, the resources are available and it is the application of those resources on a broad basis that is making data science take the world by storm.  If in doubt, let’s retrieve and rank the options and you decide who makes the biggest impact:  Thor or a great Data Scientist.   By the way, there is a service for that  but as a budding data scientist you saw that one coming!


Valinda Kennedy

Twitter:  vscarbro

Blog:  vscarbro

LinkedIn:  Valinda Kennedy




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s