Last week myself and several colleagues from EMC attended the Strata Healthcare Conference here in San Francisco, and I thought it was a huge success. I also expect that for some folks, it was a very troublesome conference. Why would I say that? Because the number of massive data sets available today, and the number that are coming on-line in very short order, is almost debilitating. It’s one of the reasons why many of the entrepreneurs and venture capitalists who had success in the internet and financial services industries, have now trained their sights on the healthcare industry – a $1 trillion industry (in just the US) where severe processing, analytic, and business inefficiencies exist (see Figure 1).
Figure 1: US Healthcare Spend 1997-2017
Healthcare Data Challenges and Opportunities
There are many massive, big data sources – both structured and unstructured – available that can yield new information and insights about patients, procedures, medical treatments, medicine testing, clinical studies, drug research, and the payer-provider relationship (see Figure 2).
Figure 2: Tsunami of Current and New Healthcare Data Sources
And new massive, big data sources are on their way, such as:
- Genomics or gene sequencing, which contain over 2.3B snips of data per each human strand of DNA. And not only has the price of DNA and genomics testing dropping to the level of the common man, but organizations like 23andMe.com are working to “liberate” your DNA so that it is easier to share with other genomics organizations and to pool your genomics with the genomics of others in order to identify causes of diabetes, cancer, blindness, and even baldness.
- Mobile apps and social media are creating vibrant communities around specific heath issues. There is much innovation in the mobile healthcare space to simplify and encourage the capture and sharing of your personal health data by startups such as WebMD and AchieveMint.com.
- “Intelligent” personal home health monitoring devices (blood glucose, blood pressure, medication monitoring, smart toilets) that will unleash a tsunami of data and insights about your current health conditions and flag patterns or trends of personal health concerns.
Healthcare Big Data Enabling Technologies
As I’ve discussed in the past, the advent of new, more detailed data sources typically requires new technologies designed to acquire, store, manipulate, and analyze these new data sources. Big data technologies, many of which have been perfected in other industries like digital media (ad serving, real time bidding, attribution analysis) and financial services (algorithmic trading, fraud detection), are now available to the healthcare industry to merge, integrate, synchronize, and tease out the insights buried in these new data sources.
Figure 3: Big Data Enabling Technologies
Big Data Healthcare Use Cases
A number of healthcare use cases, enabled by these new sources of data and innovative big data technologies, are starting to emerge. Here are a few examples:
Figure 4: Healthcare Big Data Use Cases
- Detecting fraud in real-time by using Hadoop to match historical claims and payment data, with in-memory computing to analyze current transactions to flag or score potential fraudulent activities.
- Reducing hospital readmissions by using MPP databases and data virtualization to access and integrate past admissions and outcomes with current patient data, and in-database analytics to create re-admission scores at the time of patient check-in that can suggest personalized hospitalization plans for at-risk patients.
- Improving patient care using Hadoop and data virtualization to synchronize all of a patient’s history of treatments, procedures, lifestyle changes, therapy – and even DNA data in the near future – with advanced analytics to attribute the effectiveness of different medications, treatments, and lifestyle changes upon a patient’s health score.
“Data Is Good. More Data Is Better!”
Few industries have the variety of data sources, many of them publicly available, to provide unique, actionable insights into the quality and cost of healthcare. The potential is almost endless, as healthcare organizations look to take the next step in pooling data across patients, treatments, procedures, studies, and more to preempt disease outbreaks or identify the potential causes of life-threatening health conditions like diabetes (see Figure 5).
Figure 5: Pooling Data to Yield New Healthcare Insights
To quote my colleague Hulya Emir-Farinas, a data scientist within our Greenplum division, “Data is good. More data is better.” More detailed and diverse data sets can yield new insights and perspectives on some of our healthcare problems, and enable new solutions to fulfill that goal of providing better patient care at lower costs. More data can enable healthcare organizations to uncover the real causes of healthcare problems so that actionable, cost-effective solutions and care can be directed at those problems.
By the way, if you’d like to see my Strata Rx presentation, you can check it out here.