How Can Graph Analytics Uncover Valuable Insights About Data?
Part I of our series on graph analytics introduced us to graph analytics, and its brethren graph databases. We talked about the use of graph analytics to understand and visualize relationships between people and devices that are part of the same network. From these relationships, we can apply analytics to uncover insights about relationships include the “strength” and “direction” of these relationships. Part II of the series covers some specific use cases of how graph analytics are being used to uncover new insights about your customers, products, and operations.
Graph Analytics Use Cases
Graphs can be used to model many types of relations and processes in physical, biological, social, and information systems, including:
- In computer science, graphs are used to represent networks of communication, data organization, computational devices, the flow of computation, etc. For instance, the link structure of a website can be represented by a directed graph, in which the vertices represent web pages and directed edges represent links from one page to another.
- Graph-theoretic methods have proven useful in linguistics, since natural language often lends itself well to discrete structure. Traditionally, syntax and compositional semantics follow tree-based structures, whose expressive power lies in the principle of compositionality, modeled in a hierarchical graph.
- Graph theory is used in sociology as a way, for example, to measure actors’ prestige or to explore rumor spreading, notably through the use of social network analysis software.
- In the world of intelligence, numerous government agencies are interested in identifying threats through the detection of non-obvious patterns of relationships and group communications buried in social media, email, texting and call detail records.
- In life sciences, organizations can use graph analytics to conduct research in healthcare fraud for healthcare payers. In addition to the healthcare fraud detection program, other potential graph analytics use cases include healthcare treatment efficacy and outcome analysis, analyzing drugs and side effects, and the analysis of proteins and gene pathways.
- In the area of personalized healthcare, a startup called Lumiata wants to scale personalized medicine by leveraging machine learning and graphic analytics to help doctors to focus on more urgent care needs and empower nurses to carry more of the diagnostic chores.
- Graph Analytics can be used to address relationship-based problems in manufacturing, energy, gas exploration, travel, biology, conservation, computer chip design, chemistry, physics, higher education research, government, security, defense and many other fields.
Advantages Offered by Graph Analytics
A key advantage of graphs is the ease with which new sources of data and new relationships can be added. Graph databases using Resource Description Framework (RDF) to represent the graph can easily merge and unify diverse datasets without significant upfront investment in data modeling. Such an approach lies in stark contrast to traditional analytics, in which a great deal of time is spent organizing data, and the addition of new data sources requires time-consuming and error prone effort by analysts.
The easy on-boarding of new data is particularly important when dealing with Big Data. Traditional analytics focus on finding answers to known questions. By contrast, many of the highest value applications, such as those identified above, are focused on discovery, where the questions to be answered are not known in advance. The ability to quickly and easily add new data sources or new relationships within the data when needed to support a new line of questioning is crucial for discovery, and graphs are uniquely well qualified to support these requirements.
Graph analytics also offer sophisticated capabilities for analyzing relationships, while traditional analytics focus on summarizing, aggregating and reporting on data. Use the right tool for the job. Some common graph analytic techniques include:
- Centrality analysis: To identify the most central entities in your network, a very useful capability for influencer marketing.
- Path analysis: To identify all the connections between a pair of entities, useful in understanding risks and exposure.
- Community detection: To identify clusters or communities, which is of great importance to understanding issues in sociology and biology.
- Sub-graph isomorphism: To search for a pattern of relationships, useful for validating hypotheses and searching for abnormal situations, such as hacker attacks.
Graph Analytics Complementary to Hadoop
Interestingly, Hadoop and graph analytics complement each other quite nicely. Hadoop is a scale-out solution, allowing independent items of work to be parceled out to the computers in a cluster. Graph analytics, on the other hand, excel at looking at the “big picture,” analyzing complex networks of relationships that cannot be partitioned.
For example, consider risk analysis within a financial solution. Many documents will need to be independently analyzed, and the relationships between organizations extracted. This is an ideal job for Hadoop since each document is independent of the others. On the other hand, the complex network of relationships between organizations form a non-partitionable graph, which is best analyzed as a single entity, in memory.
Google Knowledge Graph: Graph Analytics To Deliver More Relevance
As if we need more proof of the potential of graph analytics, Google is undertaking the most extensive use of graph analytics in support of their new Google Knowledge Graph project. The power of the Google Knowledge Graph project is to leverage the wealth of knowledge that they have gathered from numerous customer searches to provide the most relevant search results to you. Searching for Jaguar? Does that mean Jaguar the animal or Jaguar the car or the Northern Jaguar project? By leveraging graph analytics, Google is working to create a semantic relevant set of relationships that are most relevant for YOU, the consumer, and improve their ability to deliver the information that Google thinks is most relevant for you (for me, that would probably be a Jaguar car as I drive a dusty, old 1999 Mercedes that is way overdue to be replaced).
The Google Knowledge Graph is a knowledge base that enhances Google’s search engine’s search results with semantic search information gathered from a wide variety of sources. The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do.
Google attempts to fully understand the context of my search request by leveraging as much data about me, my interests and relationships as possible. For example, the fact that I am in the Bay Area means that not only should Google serve results about the car but also that those results (i.e., Jaguar dealers) should be geographically relevant. Perhaps Google also understands my search request context better if it knows that in my social graph contains a number of Jaguar owners/admirers, or that I am likely talking about the car because I have recently searched for a shop that can repair my aging Mercedes.
Graph Analytics and Metadata
It doesn’t take a large leap of faith to envision the potential of integrating graph analytics into the metadata discovery process. This topic, along with other topics associated with Data Governance 2.0, is going to be covered in more detail by my colleagues Rachel Haines and Scott Lee at the upcoming Strata conference in Santa Clara on Wednesday, February 12th at 1:30. The title of their presentation is “Evolving Data Governance for the Big Data Enterprise.”
Rachel and Scott will not only discuss how metadata discovery can benefit from graph analytics but also how graph analytics can be employed to understand a user’s data responsibilities, their role in the organization, and their historical patterns of data creation/use.
As an example, the Dow Jones & Company‘s news search engine leverages text analytics to uncover metadata that is used to fulfill customer search demands. However, Dow Jones & Company has discovered the limitations of text analytics alone to fully address the metadata discovery process.
“Text analytics only goes so far, however, as the language of business is complex and constantly evolving. When it comes to potentially ambiguous words and phrases, it parses the information using rules defined by a human editor. ‘We might need a rule to identify that a story is about Apple, for example, and not apples,’ explains Merkle.”
What if Dow Jones were to invest in a graph analytics capability to build out their semantic insights—something like what Google is doing with the Knowledge Graph product? It seems the integration of graph analytics with metadata management could take semantic understanding and metadata discovery to the next level.
Graph analytics provide another arrow in our quiver – another tool that we can use against these vast amounts of social media and sensor-based data to uncover new insights about the relationships between our customers, products, and operations. Graph analytics allows us to get new, more actionable, more relevant answers to many of our traditional questions (Who are our most important customers? What are our most important products?), as well as answer completely new questions (Who are our most influential customers? Where are our largest networking or operational security risks?). It seems like in the Big Data world, new tools and new algorithms are being developed all the time to help the business stakeholders optimize their key business processes and uncover new monetization opportunities.
Graph Theory from Wikipedia
How to Use Graph Databases to Analyze Relationships, Risks and Business Opportunities video
Extending and Augmenting Hadoop
 For more information on RDF, check out: http://en.wikipedia.org/wiki/Resource_Description_Framework
 In math, a one-to-one correspondence between the elements of two sets such that the result of an operation on elements of one set corresponds to the result of the analogous operation on their images in the other set.