NoSQL is a new class of databases, which are very different from the traditional Relational databases (also known as RDBMS).
Before diving into the details of a NoSQL database, let us take a look at the shortcomings of a RDBMS and the benefits of using a NoSQL database.
Why do we need a NoSQL database?
1. Flexible Schema:
In the RDBMS world, the schema needs to be defined or designed upfront. Schema definition includes tables, columns, column data types, column data length etc. This could be a hindrance, if the format of the data which is stored changes frequently (due to varying reasons like, frequently changing requirements to align with rapid changes in business, or because the data is not in our control and the source of data is elsewhere). For instance, let’s say we are designing a system which captures work experience of people. There could be a vast variety of data that could be potentially captured and it may not be possible to define the schema upfront.
Another issue with rigid schema is that, we end up paying for storage which we don’t use. Let’s say there is an attribute which is used only 10% of times. In a RDBMS, whether the column has data or not, it still consumes storage. So we are wasting storage space and even paying for it!
NoSQL databases on the other hand support flexible schema. (To draw parallels with RDBMS, imagine having to define only a table and then, columns could be defined on the fly as the data comes in). If an entity has 20 different attributes and each of these attributes have data for only 10% of the entire data, then it consumes space for only those 10% of the data. Thus we get flexibility as well as efficient use of storage.
|Name (John)||Age (30)|
|Name (Kelly)||Age (23)||Salary (60k)|
|Name (Howard)||Age (28)|
In the table shown above, only one record (having name Kelly), has a salary attribute, whereas the other records do not. The salary attribute occupies space only for that record.
Traditional RDBMS generally focus on data integrity and reliability and thus may not be highly available. Also, they are not distributed by design. Though it is possible to have distributed RDBMS databases, they are generally expensive and involve complicated setup and configurations. Most of the NoSQL databases are generally distributed by design and are comparatively inexpensive. Since they are distributed, they are highly available.
With the advent of cloud, high availability is an essential characteristic of most applications. So it is no surprise that NoSQL databases are gaining popularity.
NoSQL database types:
It is important to have a good understanding of the types of NoSQL databases and the pros and cons of each, so that we are in a position to choose the right database for the right job.
Types of NoSQL databases:
- Document Based databases
- Key Value databases
- Columnar Based databases
- Graph Based databases
1. Document Based Databases:
These types of NoSQL databases store and retrieve document format of data. A document could be a XML document, a JSON document, a YAML or a BSON document. Each document will have field name and values encoded in specific format. For e.g., XML document will have element name and value, JSON document will have attribute name and value etc. The document will be stored and retrieved as is. Also, each document data layout could be different from the other, providing flexibility.
In the above sample document, the second ‘experience’ document has an attribute ‘Current Position’ which is not present in the first one and this provides flexibility. Some of these databases also support search by individual fields and support indexing by fields.
Mongo DB, Couch DB, Elastic Search
2. Key-Value Databases:
These types of NoSQL Databases store data using a data structure like ‘Dictionary’ or a ‘Hash’. A Hash stores data in terms of keys and values. A value could be either a single value or a tuple (list of values) or even another hash. The key should be unique in a given Dictionary and the hash uses the 'Hashing Algorithm' to retrieve the values.
|Car||Honda, Toyota, Mercedes|
|Bike||Harley Davidson, Kawasaki|
In the above example, the key ‘Car’ has values like ‘Honda, Toyota, Mercedes’.
Redis, Riak DB
3. Columnar Database:
In this type of NoSQL database, data is grouped into columns rather than into rows. Columns are grouped logically into column families. The data is partitioned and stored based on columns. Column based partitioning supports aggregation functions quite easily. For instance, let us say we want to compute the average age of a customer from a customer table having a column called ‘age’. In a RDBMS, the average function would have to scan every row and perform aggregation. This would include a full table scan and is a very costly operation. On the other hand, in a columnar database, all the values of a given column are stored close to each other. So it is faster to perform column specific aggregation function.
4. Graph database:
These types of databases store data as nodes and edges of a connected graph. Nodes represent data and edge represents relationship between nodes. Graph databases are useful in representing data where relationships between data points also need to be represented. A Graph database provides queries which helps in traversing the graph in a simple and efficient manner.
For e.g., in a social media application, a Graph database could be used to connect people in a network.