Understanding NoSQL Databases

in Data Engineering
 • Updated on 
Understanding-NoSQL-Databases

Is a NoSQL database better than one that follows the relational model, or vice versa? This is a question not easily answered. As a matter of fact, it may not have an answer at all. It depends on what kind of data the database will be storing, organizing, and manipulating. Once this is known, the choice becomes much clearer. NoSQL, also known as “not only SQL,” has become increasingly popular in recent years. The age of big data is upon us and the need to capture massive amounts of unstructured data is a topic of great interest in many organizations. Leveraging large amounts of data, boiling it down to understandable patterns, and using this knowledge to make sound business decisions is what is at the heart of analytics. Whether they are large or small, organizations are realizing this is an area they must not neglect if they wish to compete successfully. However, working with these huge datasets can be a formidable challenge, and this is precisely what NoSQL is best at. NoSQL was developed to meet the rise in volume of data and users. In the past, a few thousand concurrent users seemed excessive. Today, some online applications could potentially have hundreds of thousands, possibly even millions of users in a very short timespan. From shopping preferences to minute details of manufacturing output, these massive quantities of data are captured, stored, and analyzed. Unfortunately, these activities go beyond the capacity of the traditional relational database. NoSQL databases can cope with this new “big data landscape” by providing horizontal scalability, high-performance processing, and the ability to efficiently handle unstructured data.

# Characteristics of NoSQL

The storage and retrieval methods of NoSQL databases differ significantly compared to that of the traditional relational database management systems (RDBMS). One of the biggest differences between NoSQL and the relational model is the fact that many NoSQL databases do not have a rigid structure compared to RDBMSs. This is key in the efficient handling of unstructured data.

Scalability

In applications that deal with massive volumes of data, the need to scale a system quickly and elastically becomes imperative. NoSQL excels in scalability compared to its relational counterpart. In the past, when database load increased and the system needed to be expanded, the primary choice for an RDBMS was to scale up. This usually required upgrading to a bigger, more expensive server. Today, NoSQL allows you to scale out. This involves adding extra servers to a cluster, allowing it to take up the additional database load. Cheaper commodity servers can be used, the process is almost transparent, the deployment time is minimal, and if done right, there should be no application downtime.

Schemas

The rigid schema the relational model uses is defined by relationships and constraints between underlying tables that make up the database. These are a set of very strict rules that govern the many operations of the traditional RDBMS. A NoSQL database, however, operates on a “schema-less” model. It is not bound by the rigorous rules that the relational model enforces. This makes a NoSQL database much more flexible, allowing it to handle structured, semi-structured, and unstructured data with ease.

Cost

The hardware requirements of the average RDBMS can be expensive. This type of database usually needs expensive proprietary servers to run. The software licensing fees can also be excessive for many corporations. Since NoSQL is designed to run on clusters of cheap commodity servers, its cost-effective aspect becomes very attractive to organizations. Another advantage is the fact that quite a few flavors of NoSQL are open source, helping to cut costs further. Lastly, many of the commercial RDBMS installations require the need of trained database administrators. These professionals do not come cheap. The simpler data model of a NoSQL database greatly reduces the need for administration.

Performance Features and Advantages

Aside from the speed and ease with which a NoSQL database can capture and process vast amounts of data, its operational environment has several other key performance advantages. NoSQL databases are implemented on distributed architecture. This means that there is no single point of failure. These high availability clusters are tuned so that if one node in the cluster goes down, there is enough redundancy built into the system allowing it to operate continuously. The distributed architecture NoSQL databases run on also make it possible to implement features such as fault tolerance and disaster recovery. The characteristics of NoSQL highlighted above are the reasons that make it a good choice for handling big data needs quickly and efficiently. These features were designed with the three Vs of big data in mind: volume, velocity, and variety. These three words elegantly represent the excessive amounts, extraordinary speeds, and diverse kinds of data that organizations are dealing with today. NoSQL databases offer a much cleaner solution to this than the relational model can and that is why it is gaining in popularity.

# NoSQL Database Types

There are four general classifications of NoSQL. Each have their advantages and disadvantages, and must be considered carefully according to the needs of the organization and the technical solutions the database must meet.

  • Key-value store: One of the least complex types of NoSQL database, key-value stores are arranged with an indexed key and its corresponding value. The schema-less nature of this type is evident in the way data is stored. The value part of the key-value pair is information stored regardless of data type. Some examples of key-value databases include Couchbase, Redis, and Riak

  • Document store: Similar to the key-value store, the document store assigns a document to a unique key. This key is then used to specifically identify the document in database operations. The individual documents in this type of NoSQL database are similar, yet could also have substantial differences. These documents are also known as semi-structured data. MongoDB, CouchDB, and HyperDex are popular NoSQL databases of the document store type.

  • Column store: This type of database stores data in columns rather than in rows. The advantage to the column store is its ability to deliver a query resultset with minimal response time. Also, the architecture of column stores facilitates scalability. Cassandra, Druid, and Apache HBase fall into this category of NoSQL database.

  • Graph store: The quick and easy need to represent data as a graph with relationships between the different elements is what this type is used for. This may sound very similar to a relational database except for one very important difference. In the RDBMS, if the relationships have to be changed it cannot be done without very complicated changes to the schema or data. This kind of modification difficulty does not apply to graph databases. Popular graph store types include Neo4J, InfiniteGraph, and OrientDB.

# NoSQL Limitations

Considering its usefulness, NoSQL databases still have some deficiencies. Just the fact that it does not have a schema can cause problems when it comes to data redundancy and accuracy. But wait a minute. Didn’t we say that a schema-less database is just what was needed when dealing with unstructured data? Well yes, but in other circumstances not having a schema can have detrimental effects. Since the database is not enforcing strict rules on the collection, organization, and storage of data as the RDBMS does, it leaves the door open to loss of data integrity. NoSQL vendors claim that mechanisms have been put into place to account for this, but these solutions are programmatic in nature rather than built-in. The modern RDBMS is a mature, time-tested system with many decades of rigorous operational use behind it. NoSQL databases still need to do some catching up in this area.

# NoSQL vs. Relational

As we have seen, there are certain situations when a NoSQL database is indispensable. However, other data needs require the maturity of the RDBMS. This is not always a clear-cut situation, and many organizations find that they cannot do without having both models working side by side. Table A below highlights the differences between the two.

Table A: NoSQL vs. Relational

Feature

NoSQL

Relational

Scaling

Horizontal (scaling out).

Vertical (scaling up).

Schema

Dynamic (no schema).

Schema used.

Data Manipulation Language

Data is manipulated through various APIs.

SQL used as the standard language.

Data Integrity

Product specific.

Strong and consistent.

Operational Reliability

CAP (Consistency, Availability, Partition tolerance)

ACID (Atomicity, Consistency, Isolation, Durability)

Database Types

Four primary types: Key-value, Document, Column, and Graph.

Only one basic type.

Transaction Support

Variable support.

Transactions well supported.

Workload Diversity

Good workload diversity. Well suited for big data.

Lacking the ability to efficiently deal with unstructured data.

# Conclusion

The demise of the relational database may be premature. NoSQL databases have made their mark on the database world. However, it is becoming obvious that most organizations cannot do without both models. For example, a columnar database is extremely fast when querying terabytes of data, but takes a few seconds for a query to return even when querying against marginal sets of data. In this situation, the RDBMS is best in speed and efficiency. Organizations using MySQL or PostgreSQL for their transactional, e-commerce services may also have a need for NoSQL when it comes to dealing with aggregate data for business intelligence analytics. The popularity and need for solutions and frameworks like Amazon Redshift, Vertica, and Hadoop confirm this. Either way you look at it, hybrid datacenters featuring both NoSQL and the relational model will probably be around for some time to come.

Worry-free replication from source to Redshift & Snowflake
Unlimited sync during trial
No credit-card required
World class support
Try FlyData for free
Amazon Partner Logo Certified AWS
Redshift partner
Get started. Try FlyData.
Quick setup. No credit card required. Unlimited sync during trial.
Fast and secure Enterprise-grade security and near real-time sync.
World-class support Proactive monitoring from technical experts, 24/7.
Straightforward pricing Pay for the rows you use, and nothing you don’t.