Big Data

Big Data and NoSQL: The Problem with Relational Databases

April Reeve By April Reeve September 7, 2012

The NoSQL movement, where “NoSQL” stands for “Not Only SQL” is based on the concept that relational databases are not the right database solution for all problems.  Relational databases are so ubiquitous in most organizations these days that many people may not even be aware that there are other types of databases, let alone when using another database might be preferable. Relational databases perform transaction update functions very well, particularly handling the difficult issues of consistency during update. Production strength relational databases can handle the complexity of two phase commit capability, where one business transaction affects multiple databases and tables, and all updates have to be effected at the same moment.

However, relational databases apply much of the same overhead required for complex update operations to every activity, and that can handicap them for other functions. Relational databases struggle with the efficiency of certain operations key to Big Data management.  Firstly, they don’t scale well to very large sizes, and although grid solutions can help with this problem, the creation of new clusters on the grid is not dynamic and large data solutions become very expensive using relational databases. Secondly, they don’t do unstructured data search very well (i.e. google type searching) nor do they handle data in unexpected formats well. Thirdly, but not lastly, it is difficult to implement certain kinds of basic queries using SQL and relational databases, such as the shortest path between two points.

Social networking and Big Data organizations such as Facebook, Yahoo, Google, and Amazon were among the first to decide that relational databases were not good solutions for the volumes and types of data that they were dealing with, hence the development of the Hadoop file system, the MapReduce programming language, and associated databases such as Cassandra and HBase.  One of the key capabilities of a Hadoop type environment is the ability to dynamically, or at least easily, expand the number of servers being used for data storage. The cost of storing large amounts of data in a relational database gets very expensive, where cost grows geometrically with the amount of data to be stored, reaching a limit in the petabyte range.  The cost of storing data in a Hadoop solution grows linearly with the volume of data and there is no ultimate limit.

I was a working programmer before relational databases were in common use.  Yes, we did have electricity back then.  And the databases I used were of the type called “hierarchical”.  In fact, they were more efficient, in general, for high volume individual transaction processing than relational databases, although like relational databases they were not good for data that was structured inconsistently.  But what we considered “high volume” then could be handled reasonably by my laptop now and those databases couldn’t handle dynamically allocating unlimited additional space, either.

In my next blog post I’ll describe some of the new classes of NoSQL databases and what problems they solve well.

April Reeve

About April Reeve


With 25 years of experience as an enterprise architect and program manager, April fully deserves her Twitter handle: @Datagrrl.

She knows data extremely well, having spent more than a decade in the financial services industry where she managed implementations of very large application systems.

April is a Data Management Specialist as part of EMC Global Services, with expertise in Data Governance, Master Data Management, Business Intelligence, Data Warehousing Conversion, Data Integration and Data Quality. All of these add up to one simple statement: April is very good at helping large companies organize their data and capture value from it. April works for EMC Consulting as a Business Consultant in the Enterprise Information Management practice.

Read More

Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

11 thoughts on “Big Data and NoSQL: The Problem with Relational Databases

  1. Pingback: Making the Case for Affordable, Integrated Healthcare Data Repositories and PHRs | Parity Research

  2. Wonderful blog April for a even a novice to understand NoSQL and limitations of relational databases. Attended a session on MongoDB recently and whatever you are saying fits really well in that context.

  3. Pingback: Big Data and Relational Databases – The Controversial view of the Future | Maria Pedroto

  4. Pingback: What Sales Content Management Can Learn From Facebook, Google & Amazon - Docurated

  5. Relational databases may just do well with organisational data.

    Who Knows!!!

    One thing is I do agree with that other kinds of databases may be more applicable for some problems. Let’s hope it
    provides a way clear for more applicable solutions.