Big Data

Best Practices for Analytics Profiles

Bill Schmarzo By Bill Schmarzo CTO, Dell EMC Services (aka “Dean of Big Data”) July 8, 2014

In our Big Data engagements, we talk about the importance of building detailed “profiles” of our most important entities, such as customers, products, devices, machines, employees, partners, stores, wind turbines, cars, ATMs, etc. As part of our data science process, we build a profile on each individual entity that:

1)     Captures that entity’s tendencies, propensities, patterns, trends, behaviors, relationships, associations, affiliations (plus, in the case of humans, interests and passions)

2)     Compares that entity’s current state and recent transactions, activities, and interactions to their individual profile in order to flag “unusual” activities and behaviors that might be indicative of a problem or monetization opportunity

But what do we mean by the word “profile,” and what elements might comprise a profile?

Defining and Building a Profile

A profile is a combination of metrics, key performance indicators, scores, business rules, and analytic insights that combine to make up the tendencies, behaviors, and propensities of an individual entity (customer, device, partner, machine). The profile could include:

  • Key demographic data such as age, gender, education level, home location, marital status, income level, wealth level, make and model of car, age of car, age of children, gender of children, and other data. For a machine, it might include model type, physical location, manufacturer, manufacturer location, purchase date, last maintenance date, technician who performed the last maintenance, etc.
  • Key transactional metrics such as number of purchases, purchase amounts, returns, frequency of visits, recency of visits, payments, claims, calls, social posts, etc. For a machine, that might include miles and/or hours of usage, most recent usage time and date, type of usage, usage load, who operated the product, route of product usage (for something like a truck, car, airplane, or train)
  • Scores (combinations of multiple metrics) that measure customer satisfaction level, financial risk tolerance, retirement readiness, FICO, advocacy grade, likelihood to recommend (LTR), and other data. For a machine, that might include performance scores, reliability scores, availability scores, capacity utilization scores, and optimal performance ranges, among other things
  • Business rules inferred using association analysis; for example, if CUST_101 visits a certain Starbucks and a certain Walgreens, we can predict (with 90% confidence level) that there is an 85% likelihood that this customer will visit a certain Chipotle within 60 minutes
  • Group or network relationships (number, strength, direction, sequencing, and clustering of relationships) that capture interests, passions, associations and affiliations gained from using graphic analysis
  • Coefficients that predict certain outcomes or responses based upon certain independent variables found through regression analysis; for example, a machine’s likelihood to break down given a number of interrelated variables such as usage loads since last maintenance, the technician who performed the maintenance, the machine manufacturer, temperatures, humidity, elevation, traffic, idle time, etc.)
  • Behavioral groupings of like or similar machines or people based upon usage transactions (purchases, returns, payments, web clicks, call detail records, credit card payments, claims, etc.) using clustering, K-nearest neighbor (KNN), and segmentation analysis

Example Customer Profile

A profile could be made up of hundreds, if not thousands of different metrics and scores that—when used in combination against a specific business initiative like customer retention/up-sell/reference, predictive maintenance, supplier quality, or on-time shipments—can improve the predictive capabilities of the model.

Let’s review in the table below what a profile might look like for a particular customer. Note that I have grossly oversimplified the profile to facilitate the explanation and because I can’t process anything more complex myself. My data science team is probably rolling over laughing in their Python, R, Mahout and SAS toolsets as they read this.

Profile Variable

Historical Score Variance σ 4-week
Score

Unusual Flag?

Demographics (Age, Gender, Income, Education)
Retirement Planning 90 1.25 92
Retirement Readiness 65 1.75 66
Disposable Income 95 1.50 94
Insurance Risk 45 1.10 45
Financial Risk Tolerance 50 1.25 52
Pregnancy Likelihood 0 1.00 0
Divorce Likelihood 2 1.00 2
Health Score 94 1.05 94
Exercise Frequency 81 1.45 78
Preferences (based upon Purchases, Web Browsing, Search, Mobile Apps, GPS)
Starbucks Score 95 1.25 92
Chipotle Score 88 1.60 85
Air Travel Score 82 1.90 80
United Airlines Score 70 2.25 50 X
SWA Airlines Score 45 3.10 45
Virgin Airlines Score 25 4.50 50 X
Automobiles 20 2.20 85 XX
Rules: A|B -> C (based upon Purchase transactions, GPS tracking, Mobile Apps)
Stanford Starbucks à Stanford Shell Station 55 2.50 50
Stanford Starbucks | Oregon Ave Chipotle à Middlefield Walgreens 60 3.25 55
United ORD à Chicago Uber + Schaumburg Renaissance 45 1.55 55
EPA Starbucks à EPA Sports Authority 45 2.55 15 XX
Relationships (Emails, Texts, Social Media, Phone Calls)
Carolyn Doe 98 1.05 98
Amelia Doe 98 1.01 99
Wei Lin 55 2.25 99 X
John Smith 85 1.56 25 XX
Associations (Social Media, Email, Web Browsing, Search)
Chicago Cubs 85 1.75 75
Baltimore Orioles 82 2.25 10 XX
Golden State Warriors 78 2.35 84
EMC 86 1.45 88
Kool Big Data Startup 35 3.75 80 XX

Some metrics and scores are more important than others, depending upon the business initiative being addressed. For a financial services firm focused on customer acquisition, certain data (disposable income, retirement readiness, life stage, age, education level, and number of family members) may be the most important predictive metrics. For customer retention, however, metrics such as advocacy, customer satisfaction, risk comfort score, social network associations, and select social media relationships may be the most important predictive metrics.

In my next blog, I’ll take a look at how to use these profiles in a customer retention example.

Bill Schmarzo

About Bill Schmarzo


CTO, Dell EMC Services (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting the strategy and defining the Big Data service offerings and capabilities for Dell EMC Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power the organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill was ranked as #15 Big Data Influencer by Onalytica.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored Dell EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a masters degree in Business Administration from the University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Read More

Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *