Big Data

Thinking Like a Data Scientist Part I: Understanding Where To Start

Bill Schmarzo By Bill Schmarzo CTO, Dell EMC Services (aka “Dean of Big Data”) April 30, 2015

One question I frequently get is: “How do I become a data scientist?”  Wow, tough question.  There are several new books that outline the different skills, capabilities and technologies that a data scientist is going to need to learn and eventually master.  I’ve read several of these books and am impressed with the depth of the content.

Unfortunately, these books spend the vast majority of their time reviewing and/or teaching things such as the data science processes (such as CRISP: Cross Industry Standard Process for Data Mining), and basic and advanced statistics, data mining and data visualization techniques and tools.

Yes, these are very important data science skills, but they are not nearly sufficient to make our data science teams effective.  The data science teams still need help from the business users – or subject matter experts (SME) – to understand the decisions the business is trying to make, the hypotheses that they want to test and the predictions that they need to produce in support of those decisions and hypotheses.  In essence, to improve the overall effectiveness of our data science teams, we need to teach the business users to think like a data scientist.

So the objective of this blog (which if successful, will make its way into my Big Data MBA curriculum for the University of San Francisco School of Management fall semester) is to define a process that helps business users to “think like a data scientist.”  

Thinking Like A Data Scientist Process

The goal of the “thinking like a data scientist” process is to identify, brainstorm and/or uncover new variables that are better predictors of business performance.  But “business performance” of what?  Our key business initiative, of course.

Step 1:  Identify Key Business Initiative.  Would you expect anything different from me than starting with what’s important to the business?  So, how can you spot a key business initiative?

A key business initiative is characterized as:

  • Critical to the immediate-term performance of the organization
  • Documented (communicated either internally or publicly)
  • Cross-functional (involves more than one business function)
  • Owned/championed by a senior business executive
  • Has a measurable financial goal
  • Has a well-defined delivery timeframe (9 to 12 months)
  • Undertaken to delivery significant, compelling and/or distinguishable financial or competitive advantage

I am a big stickler about targeting business initiatives that are focused on the next 9 to 12 months.  Anything longer than 12 months can quickly digress into a “Battlestar Galactica” or “cure world hunger” project that may have incredible business value, but little chance of success.

For a refresher on how to identify an organizations key business initiatives, read my blog “Big Data MBA: Reading the Annual Report for Big Data Opportunities.”  That blog outlines how to leverage publicly available information (e.g., annual reports, analyst calls, executive speeches, company blogs, SeekingAlpha.com) to uncover an organization’s key business initiatives.

For purposes of this exercise, I’m going to pretend that our client is Foot Locker, and that our target business initiative is “Improve Merchandising Effectiveness” as highlighted in their annual report (see Figure 1).

billfig1

Figure 1: Identifying and Understanding Organization’s Key Business Initiatives

 

Step 2:  Identify Strategic Nouns.  Strategic nouns are the key business entities that either impact or are impacted by the organization’s key business initiative.  These strategic nouns are critical to our data scientist thinking process because these are the entities for which we want to uncover or gain new, actionable insights, and around which we will ultimately build our analytic profiles.  Examples of strategic nouns include customers, patients, students, employees, stores, products, medication, trucks, wind turbines, etc.

For the Foot Locker “Improve Merchandising Effectiveness” business initiative, the strategic nouns upon which we will focus are:

  • Customers
  • Products
  • Campaigns
  • Stores

Step 3:  Brainstorm Strategic Noun Questions. Probably the hardest part of this exercise – and maybe the hardest part of the “thinking like a data scientist” exercise – is to brainstorm the different questions that you want to ask in support of the targeted business initiative.  For this part of the exercise, we want the business users to brainstorm the business questions for each of the “strategic noun” questions from the perspectives of:

  • Descriptive Analytics:  Understanding what happened
  • Predictive Analytics:  Predicting what is likely to happen
  • Prescriptive Analytics:  Recommending what to do next

See Figure 2 for an example of the evolution from Descriptive to Predictive to Prescriptive.

Figure 2: Evolution of The Analytic Questions

Figure 2: Evolution of The Analytic Questions

In our Foot Locker “Improve Merchandising Effectiveness” example, we want to brainstorm the “Customer” strategic noun questions as such:

Descriptive Analytics (Understanding what happened)

  • What customers are most receptive to what types of merchandising campaigns?
  • What are the characteristics of customers (e.g., age, gender, customer tenure, life stage, favorite sports) who are most responsive to merchandising offers?
  • Are there certain times of year where certain customers are more responsive?

Predictive Analytics (Predicting what will happen)

  • Which customers are most likely to respond to a Back to School event
  • Which customers are most likely to respond to a BOGOF offer?
  • Which customers are most likely to respond to a 50% off in-store markdown?

Prescriptive Analytics (Recommending what to do next)

  • What personalized offers (recommendations) should I deliver to Anne Smith to get her to come into the store?

Part II of “Thinking Like a Data Scientist” blog series will conclude this “thinking like a data scientist” process and hopefully help us uncover new data sources and metrics that may be better predictors of business performance.

To learn more about EMC’s unique approach to leveraging Big Data to drive business value, please check out EMC’s Big Data Vision Workshop offering.

Bill Schmarzo

About Bill Schmarzo


CTO, Dell EMC Services (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting the strategy and defining the Big Data service offerings and capabilities for Dell EMC Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power the organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill was ranked as #4 Big Data Influencer by Onalytica.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored Dell EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a masters degree in Business Administration from the University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Read More

Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

8 thoughts on “Thinking Like a Data Scientist Part I: Understanding Where To Start

    • Thanks Sheppard! So true! One can get so much more value out of big data by beginning with an end in mind.

  1. Great blog. I would however suggest a change in “positioning”. You’ve done an exceptional job outlining the inputs that a Data Scientist need to effectively support a business problem.

    They also form an excellent framework for business people to understand whether question/problem may benefit from data science support as well as prepare the business SME for engaging in a discussion. I’ve not only passed your blog to 10-15 aspiring analytics/data science people, but to my large contact list within my company.

    Anything that we can do to help communication will make all of us better consumers and producers of data.

    Thanks again for an excellent framework for productive dicussions.

  2. Bob, thanks for helping spread the word. The goal of this series of blogs (as is the “Big Data MBA” class that I teach at the University of San Francisco School of Management) is to get the business users to start thinking like a data scientist. It’s not realistic, or even desirable, to try to convert business users into data scientists. But it is certain realistic to expect that our business users can start thinking like a data scientist. When that happens and you couple the business users with data scientists, all sorts of magic happens!!

    Thanks again Bob. Spread the word!!

  3. An excellent framework for the analytic thinking process that’s grounded in business realities. Interestingly enough, this same framework could’ve been titled, “How to make a data scientist think like a businessperson”!

  4. This is really Nice Blog and it leaves us with a thought that data we see in today’s world on daily basis can give us innovative ideas and make us think like Data Scientist. Really Nice Blog. Exceptionally good.

  5. Kiran, good point. I think that this “Thinking Like a Data Scientist” framework can drive closer collaboration between the Business users and the Data Scientist — it allows them to share a common approach for how to brainstorm data sources and explore analytics with the goal of delivering business value. Thanks for the feedback!