Best Practices for Analytics Profiles
In our Big Data engagements, we talk about the importance of building detailed “profiles” of our most important entities, such as customers, products, devices, machines, employees, partners, stores, wind turbines, cars, ATMs, etc. As part of our data science process, we build a profile on each individual entity that:
1) Captures that entity’s tendencies, propensities, patterns, trends, behaviors, relationships, associations, affiliations (plus, in the case of humans, interests and passions)
2) Compares that entity’s current state and recent transactions, activities, and interactions to their individual profile in order to flag “unusual” activities and behaviors that might be indicative of a problem or monetization opportunity
But what do we mean by the word “profile,” and what elements might comprise a profile?
Defining and Building a Profile
A profile is a combination of metrics, key performance indicators, scores, business rules, and analytic insights that combine to make up the tendencies, behaviors, and propensities of an individual entity (customer, device, partner, machine). The profile could include:
- Key demographic data such as age, gender, education level, home location, marital status, income level, wealth level, make and model of car, age of car, age of children, gender of children, and other data. For a machine, it might include model type, physical location, manufacturer, manufacturer location, purchase date, last maintenance date, technician who performed the last maintenance, etc.
- Key transactional metrics such as number of purchases, purchase amounts, returns, frequency of visits, recency of visits, payments, claims, calls, social posts, etc. For a machine, that might include miles and/or hours of usage, most recent usage time and date, type of usage, usage load, who operated the product, route of product usage (for something like a truck, car, airplane, or train)
- Scores (combinations of multiple metrics) that measure customer satisfaction level, financial risk tolerance, retirement readiness, FICO, advocacy grade, likelihood to recommend (LTR), and other data. For a machine, that might include performance scores, reliability scores, availability scores, capacity utilization scores, and optimal performance ranges, among other things
- Business rules inferred using association analysis; for example, if CUST_101 visits a certain Starbucks and a certain Walgreens, we can predict (with 90% confidence level) that there is an 85% likelihood that this customer will visit a certain Chipotle within 60 minutes
- Group or network relationships (number, strength, direction, sequencing, and clustering of relationships) that capture interests, passions, associations and affiliations gained from using graphic analysis
- Coefficients that predict certain outcomes or responses based upon certain independent variables found through regression analysis; for example, a machine’s likelihood to break down given a number of interrelated variables such as usage loads since last maintenance, the technician who performed the maintenance, the machine manufacturer, temperatures, humidity, elevation, traffic, idle time, etc.)
- Behavioral groupings of like or similar machines or people based upon usage transactions (purchases, returns, payments, web clicks, call detail records, credit card payments, claims, etc.) using clustering, K-nearest neighbor (KNN), and segmentation analysis
Example Customer Profile
A profile could be made up of hundreds, if not thousands of different metrics and scores that—when used in combination against a specific business initiative like customer retention/up-sell/reference, predictive maintenance, supplier quality, or on-time shipments—can improve the predictive capabilities of the model.
Let’s review in the table below what a profile might look like for a particular customer. Note that I have grossly oversimplified the profile to facilitate the explanation and because I can’t process anything more complex myself. My data science team is probably rolling over laughing in their Python, R, Mahout and SAS toolsets as they read this.
|Historical Score||Variance σ||4-week
|Demographics (Age, Gender, Income, Education)|
|Financial Risk Tolerance||50||1.25||52|
|Preferences (based upon Purchases, Web Browsing, Search, Mobile Apps, GPS)|
|Air Travel Score||82||1.90||80|
|United Airlines Score||70||2.25||50||X|
|SWA Airlines Score||45||3.10||45|
|Virgin Airlines Score||25||4.50||50||X|
|Rules: A|B -> C (based upon Purchase transactions, GPS tracking, Mobile Apps)|
|Stanford Starbucks à Stanford Shell Station||55||2.50||50|
|Stanford Starbucks | Oregon Ave Chipotle à Middlefield Walgreens||60||3.25||55|
|United ORD à Chicago Uber + Schaumburg Renaissance||45||1.55||55|
|EPA Starbucks à EPA Sports Authority||45||2.55||15||XX|
|Relationships (Emails, Texts, Social Media, Phone Calls)|
|Associations (Social Media, Email, Web Browsing, Search)|
|Golden State Warriors||78||2.35||84|
|Kool Big Data Startup||35||3.75||80||XX|
Some metrics and scores are more important than others, depending upon the business initiative being addressed. For a financial services firm focused on customer acquisition, certain data (disposable income, retirement readiness, life stage, age, education level, and number of family members) may be the most important predictive metrics. For customer retention, however, metrics such as advocacy, customer satisfaction, risk comfort score, social network associations, and select social media relationships may be the most important predictive metrics.
In my next blog, I’ll take a look at how to use these profiles in a customer retention example.