Database Index
In this section, we will take a look at Database index, types of index and how it works.
An index is used to assist in faster data retrieval. An index is generally created after analyzing retrieval patterns. A primary key is by default an indexed column. We can also create custom indices based on one or more columns.
For e.g., in a shopping cart application, let us consider a table which stores all the items (ITEM Table). The most common search pattern could be 'search by model name and price'. We can create an index based on the two columns, 'model name' and 'price'.
Full Table scan:
Full table scan is an operation on a table, which inspects every record of a table to find the record with the matching criteria. Full table scan is an expensive operation and should be avoided. Any search operation on non-indexed columns would result in a full table scan.
How does an index help in faster retrieval of data?
An index is stored as a data structure internally, typically as a sorted Binary Tree. The data structure itself contains index column details and pointers to actual data. So if we are searching for an item with name 'TV' and price '30k', the indexed search traverses the B-Tree and locates the node and then retrieves the address of the actual data. Thus it can get to the data node faster (B- Tree traversal is faster when compared to a full table scan)
Types of indexes:
Following are a few types of indexes which are commonly used:
- Clustered Index
- Non Clustered Index
- Hash Index
- Bit Map Index
1. Clustered Index
In this type of index, the data itself is stored in a sorted order as specified by the index. Since the data itself will be rearranged, there could be only one clustered index on a table. Clustered index helps in reducing I/O and faster access to data in a sequential order. (In most relational databases, the primary key will be a Clustered Index). As shown in above diagram, the table data itself is stored in a sorted order as a Balanced Binary tree.
2. Non Clustered Index:
In this type of index, the data itself is not stored in any particular order, but the index data is stored separately. The index data contains pointers to the actual data. As shown in the above diagram, the actual data is stored separately and the index data is a stored as a sorted Balanced Binary tree with the leaf nodes having a pointer to the actual data.
3. Hash Index:
In this type of index, the hash of the indexed columns is stored in a hash table. This type of data structure doesn't support inequality searches. For e.g., search like 'price >30000' cannot be performed using this index.
4. Bit map Index:
This type of index is generally used on columns which have low cardinality. i.e. the column should have few unique values. For instance, data like 'Gender', flag values like 'True/False' etc. could be considered for this index. A bitmap index stores a data structure for each unique value and sets the bits to 0/1 for every record based on column values.
Limitations of Index:
Indexes have their own limitation. Having too many indexes on a table would mean additional storage requirements to store the index data (in case of non-clustered index), and also might affect performance of insert/update statements as the index trees (or the data itself in case of clustered index) need to be reorganized in sorted order. So indexes have to be created judiciously, after thorough analysis.
Reference Implementations:
Most of the relational databases like Oracle, MySQL, and PostgreSQL implement one or the other above mentioned indexes and some of them also offer other variants of index implementations.