Introduction:
When you read this article, its important for one to understand what an index is. You may ask why? Well because as I have been told by someone and rightly said so, if I may say, when you ask WHY, you solve 50% of your problem. So, lets jump right in.
What is an Index?
- Index is an SQL server object that you create on a table or view to improve query performance. It’s defined and managed within the database system. Technically, an index uses underlying data structures, such as B-trees (in the case of most SQL Server indexes), to organize and maintain pointers to the data, enabling efficient data access and retrieval.
Let’s understand this better by asking the questions why and how.
WHY should I use index?
- Improved Query Performance: The primary reason for using indexes is to accelerate query processing. Indexes can drastically reduce the amount of data the server needs to examine.
- Efficient Data Access: Indexes provide a quick way to access row data for SELECT statements. This is particularly beneficial for tables with many rows.
- Sorting and Grouping Speed: Indexes improve the speed of data retrieval operations by providing a sorted version of the data, which is faster to process for ORDER BY and GROUP BY operations.
- Unique Constraints: Indexes can be used to enforce uniqueness for columns to ensure that no two rows of a table have duplicate values in a particular column or a combination of columns.
- Optimized Join Operations: In databases with multiple tables, indexes improve the speed of join operations by quickly locating the joining rows in each table.
HOW does index help?
First let’s understand the different types of Indexes that are available:
- Clustered Index: A clustered index determines the physical order of data in a table. It sorts and stores the data rows of the table or view in order based on the clustered index key. This means the data is physically rearranged to match the order of the index. The clustered index supports fast retrieval of the rows, based on their clustered index key values because it knows the exact order and location of the rows. A table/view can have only 1 clustered index.
Example: Imagine trying to find a location in the local registry. Its easier as each location is logged in according to the pin code in an order numerical format. Hence it is easier to locate a place by just searching for the pin code directly.
- Non-Clustered Index: In a non-clustered index, the index contains a sorted list of key values from one or more columns, along with pointers to the exact location of the data in the table. This allows SQL Server to quickly look up the key value in the index and then go directly to the data’s location, speeding up search and retrieval operations. This index doesn’t change the order of the rows but gives you pointers to quickly locate specific information.
Example: Imagine you have a book, and the book’s index at the back list’s topics and the pages where they can be found.
- Unique Index: A unique index ensures that the index key contains no duplicate values and therefore every row in the table or view is in some way unique. Uniqueness can be a property of both clustered and non-clustered indexes.
Example: Imagine it being like no employee of a company can have the same employee id.
- Columnstore Index: A columnstore however, stores the data column by column. This means all the data for a single column is stored together, followed by all the data for the next column, and so on. Imagine you have a big spreadsheet with lots of columns and rows. Normally, the data is stored row by row, like reading one row at a time but here the entire column data is stored together. This column-by-column storage can make reading and compressing data much faster, especially when you need to query only a few columns from a large table.
Example: If one wants to calculate the average sales from a huge sales table, a columnstore index can quickly read just the sales column without having to read all the other columns, making the query much faster and more efficient.
- Index with included columns: An index with included columns in SQL Server is a type of index that not only organizes data based on key columns but also includes additional columns. When one creates an index with included columns, SQL Server creates a primary index on key columns for quick searching and also includes additional columns in the index. This allows SQL Server to retrieve all the necessary data directly from the index without having to look up the actual table, making queries faster and more efficient.
Example: Imagine you have a library index that lists books by their titles (the key column), but also includes the authors and publication years of those books (included columns). When you search for a book by title, you can also see the author and publication year right away without having to look up the details elsewhere.
And many more….
Now that we understand the different types of indexes and their important in various case scenarios, lets understand in depth, when to use which index.
- Clustered Index:
- Always create a clustered index on a column which is being used in the where condition, especially when dealing with large data sets. This prevents SQL server from scanning the entire table but directly fetching the filtered rows.
- Create a clustered index on the columns which is used to join two tables. When you create a clustered index, since your data is sorted, it helps SQL server to easily spot the match and perform a join, rather than scanning the entire table every time.
- If in an SELECT statement you often retrieve data in a specific order, eg: ordey by date, then it is recommended to create a clustered index on that column.
- It is recommended to have a clustered index on a table which has relatively high read workload then write workload.
- Non-Clustered Index:
- It is recommended to create a non-clustered index on columns that are frequently queried but aren’t part of the primary key, as it can speed up those searches.
- If you have queries that filter or sort by specific columns, creating a non-clustered index on those columns can improve performance.
- In case of multiple joining conditions, when tables are joined based on columns that are not the primary key, a non-clustered index on those join columns can enhance performance.
- Columnstore Index:
- Columnstore indexes are particularly useful in data warehousing scenarios for analytical purposes. They can significantly improve the performance of complex queries involving aggregations, joins, and large scans.
- Columnstore indexes are optimized for operations like COUNT, SUM, AVG, MIN, and MAX, which are common in reporting and analytical queries.
- During ETL processes, where large volumes of data are processed in batches, columnstore indexes can help improve performance by allowing efficient data loading and querying.
- If your database has high data compression needs, columnstore indexes can be a good solution as it use advanced compression techniques, which can reduce storage requirements and improve I/O performance.
- For scenarios where you need to analyse historical data that is not frequently updated
- Index with included columns:
- This is another type of non-clustered index. It is recommended to include columns when creating an index, if multiple columns are being queried, filtered or sorted in the query.
- One should be careful when including columns, since using a lot of columns in the include can sometimes lead to degradation in query performance more than improvement. Only those columns should be added which are being queried frequently.
A few examples to show how these indexes work.
EXAMPLE 1:
Create a table with 3 columns, and insert 10000000 rows into it. Then try to fetch a single column from the table using a where condition. Turn on the statistics time and execution plan. See how much time the query takes to retrieve the data.
Now create an index on the column used in the where condition and see the difference.

Figure 1: Table with 10000000 rows
STEP 1: SELECT * FROM Employees WHERE EmployeeID = 3907647; (No index on Employee id in this case)

Figure 2: Execution time and plan before index
From Figure 2, we can see that the query goes for a table scan to fetch just 1 row from the entire table and takes 1015ms to complete the process.
STEP 2: Create a clustered Index on EmployeeID column and run the same query.

Figure 3: Execution time and plan after index.
After creating it can be seen in figure 3, that now instead of scanning the entire table, the query just seeks the value, and the execution time is reduced to 0ms.
EXAMPLE 2:
Create a Sales table and insert 10 records into it. Run an aggregate query and check the difference between the time and execution plan before and after creating a columnstore index.

Figure 4: Sales table
STEP1: Run the below query without creating a columnstore index on the table:
SELECT ProductCategory, SUM(SalesAmount) AS TotalSales
FROM Sales
GROUP BY ProductCategory;


Figure 5: Execution time and plan before index
STEP 2: Create a columnstore index.
CREATE COLUMNSTORE INDEX IX_Sales_Columnstore
ON Sales (SalesDate, SalesAmount, ProductCategory);

Figure 6: Execution time and plan after index
From figure 5 and 6, the difference in the query time and execution plan can be observed.
The query takes less time after the index has been created. This case scenario is only for 10 rows, imagine the impact on large data set on which analytical queries need to be performed.
Conclusion:
In summary, effective indexing strategies are essential for achieving optimal database performance, ensuring quick access to data, and maintaining system scalability as the volume of data grows. Properly applied indexes can transform data handling capabilities and significantly enhance the responsiveness and reliability of database-driven applications.
I hope that this information has been both helpful and easy to understand, until next time!!
Yatika Sheth
Yatika Sheth