Ruma Sinha
8 min readJul 31, 2020

--

The evolution of RDBMS to NoSQL in the Big Data era

In the last few decades, RDBMS database was ubiquitous. For decades the relational databases had been the database management model when required as part of building an application. The RDBMS (Relational Database Management System) combines the relational data model and the ACID transaction model. SQL (Structured Query Language) dominated the database management system completely.

In the last decade or so, with the world wide web and the Big Data, the NoSQL database started evolving in the market and adopted by various organizations for their solution building.

The question that always comes to the mind is, what happens to the RDBMS? Will NoSQL be the one size fits all for the organization requirements? What are the motivations behind the NoSQL database system?

The objective of this article is to explore the various nuances of RDBMS and NoSQL database.

What is a database?

A database is a place where data collected from various applications are stored and organized, while a database management system (DBMS) is a software for conveniently managing this database.

When it comes to choosing a database, one of the biggest challenge is picking between SQL (relational) and NoSQL(non relational) database.

What is Relational Database management System?

Traditional relational databases are highly structured databases consisting of normalized data tables which are related to each other by primary/foreign keys and queried using SQL.

These databases mainly focus on:

· Schema: The tables need to be designed beforehand

· ACID: Atomic (A) A transaction consists of many operations. All operations in a transaction succeed or every operation is rolled back. Consistent : Data should be valid according to all defined rules. Isolation (I): Transaction do not affect each other. Durable (D): committed data would not be lost, even after power failure. These are the set of properties that ensure reliable database transactions.

· Normalization: Normalized table designs in order to avoid data duplicates

· Relations: The relationship between different entities are maintained via Primary key/Foreign key constraints

Structured Query Language (SQL) is the standard language for RDBMS. SQL database can handle lots of complex queries, database transactions and routine analysis of data. SQL databases better suited for transactional type applications for intensive use. Since they are more stable and promise atomicity as well as data integrity. Most suitable for a system where data is structured and data format is known example Transactional System, Auditing System etc. Adheres to normalization of tables to minimize data duplicates.

A well designed schema is very critical. Hence Entity Relationship diagram is an important part of the database design. ERD is the graphical representation of the data requirements for the database.

Relational databases are mostly vertically scaled. Need to add more memory, CPU or storage to the same server to process more data.

Let’s say we have three entities, Customer, Products and Orders. In the Entity Relation diagram, each entity will be represented as Table and relations between the entities will be performed by Primary Key and Foreign Key constraints.

Each record in the customer table represents a customer information. The products table represents a product information. Each record in the orders table will have the reference to the customer and the product. CustID is the primary key in the customer table and foreign key in the orders table. One Customer can have 1:1 or 1:n relation that is one particular customer can have one order or more than one order. ProductID is the primary key in the products table and the foreign key in the orders table.

The data can be queried from multiple tables by joining them on the primary and foreign keys. For example, we can join Customer and Products and Orders tables to get the Customer information and the products information and the Orders information.

Various RDBMS in the market MySQL, Oracle, PostgreSQL

Db-engines.com ranking in July 2020:

We can see that RDBMS databases still going strong. Also Oracle ranks first in all the four use cases for Oracle Database in Gartner’s critical capabilities for Operational Database Management Systems Report. Oracle also named a leader in 2019 Gartner Magic Quadrant for Operational Database Management Systems.

What is NoSQL database and why the need?

With the World Wide Web and the Big Data came vast form of structured, semi structured and unstructured data. High Volume, Variety and Velocity of data became the reality!

As the amount of data keeps growing at a high rate so does the difficulty to scale the database becomes the issue. We can either Scale up that is getting a bigger machine or we can scale out that is partitioning data across more machines. We can scale up to a certain limit as large machines involve high cost and also there is a limitation that is beyond a certain point increasing RAM/CPU size will not help in handling high data load.

NoSQL is a distributed and non relational database. Follows the BASE consistency model.

Suitable for the use cases where data is unstructured and the format is unknown example Social media, Real time streaming systems etc. When managing vast amount of unstructured data like text data, social media data, customer survey data we need the NoSQL database because this information is unorganized data in nature that is no defined schema hence cannot be stored in RDBMS. These databases remove the bottlenecks of strict schema design and relations for the massive information. Supports horizontal scaling resulting in increased performance.

· Key Value database like Redis, Amazon DynamoDB

Since the key name and value stored together, this makes the data retrieval process pretty quick.

· Column based example Cassandra, HBase

Column databases store data in tables like RDBMS. Each column treated separately and stored in separate block. High performance on aggregate queries such as count, sum, avg, max etc. Cassandra uses Keyspace which is like the database. Data stored in nodes. Data Center is the collection of nodes. A collection of many centers form the cluster. Does not support join operations or subqueries. The schema is actually predefined and static. Uses Cassandra Query Language(CQL)

Use Cases: IoT, Shopping Cart protection

· Document based example MongoDB, Couchbase

MongoDB is a document based and NoSQL database engine.

MongoDB Database is a set of collections and each Collection is a set of Documents.

Data is stored in the form of BSON — Binary Encoded JSON documents. The values may be documents, arrays, array of documents. Documents stored in a collection must have a unique _id field that acts as a primary key. Does not support complex queries.

MongoDB can have Normalized Data Model as well as Embedded Data Model.

Normalized Data Model:

We can have three documents for Customer, Product and Orders. Customer document id is stored as a field in the Orders document. Product document id is stored as a field in Orders document for reference purpose during information fetching.

Embedded Data Model:

Order document will have the Product document and Customer document in separate fields within the document itself in denormalized format.

· Graph based example Neo4j

Graph based databases for recommendation engines and social networks. Uses Cypher Query Language (CQL)

A graph is composed of two elements: a node and a relationship. Each node represents an entity and each relationship represents how two nodes are associated.

The connections (relationships) between different users(nodes):

The three nodes represent the three users of Instagram. The arrow shows which user follows which other user. User1 and User2 follows each other. User2 and User3 follows each other. User3 follows User1 but User1 does not follow User3.

Benefits of NoSQL:

NoSQL databases have a simple and flexible structure. They are schema-free.This means that NoSQL databases do not have a fixed table structure like the ones found in relational databases.

NoSQL databases can replicate the data on many machines. Data can be still accessible if any of the servers crashes. Hence NoSQL databases are highly available.

NoSQL follows the BASE principle for Transaction. Base ensures that NoSQL database ensure their reliability in spite of the loss of consistency. Base stands for Basically Available Soft state Eventually consistent.

Eventually consistent — System can become eventually consistent, information is updated wherever necessary.

There is a computer science theorem that quantifies the inevitable trade offs. Eric Brewer’s CAP theorem says that if you want consistency, availability and partition tolerance, you have to settle for two out of three. For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.

CAP theorem:

The BASE principle of NoSQL:

Basic Availability: The database appears to work most of the time

Soft state: Stores don’t have to be write consistent, nor do different replicas have to be mutually consistent all the time.

Eventual consistence: Stores exhibit consistency at some later point.

Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. It is OK to use stale date and it is OK to give approximate answers.

The use cases for NoSQL:

· Analytics

· Advertising personalization

· Profile management

· Mobile applications

· High Volume Transactions

· Fraud Detection

What is NewSQL?

A category of databases that manage structured and unstructured information combining structured schemas and queries with distributed structures.

NewSQL databases provide the scalable performance of NoSQL systems for OLTP workloads as well as the ACID guarantees of traditional relational database management system. NewSQL databases are geared towards solving common problems of SQL databases related to traditional online transaction processing.

Even though it is still in its nascent stage, NewSQL may fit to be an ideal database for Big Data and OLTP applications

Example VoltDB (RealTime Database), ClustrixDB (Streaming Database), CockroachDB, Google Cloud Spanner

Conclusion:

Every database system comes with its own strengths and weaknesses. There is no one size fits all solutions. NoSQL is not necessarily replacing SQL completely. Both are used depending on the business applications. The fundamental knowledge of the architecture of each database is very important to ensure that the database we chose can provide an efficient system for the application. Companies that are already structured and not experiencing massive growth can stick to traditional databases. Companies growing with rapid space and having varied data may utilize more non relational databases.

--

--