Effortless Partition Management : Automate your SQL server partitions

Blogs

Temporary Tables vs. CTEs – The Ultimate Showdown!
July 26, 2024
Network Security In Microsoft Fabric
July 27, 2024

Effortless Partition Management : Automate your SQL server partitions

Have you ever wondered what SQL partitioning is and why it’s such a buzzword in database management? How does it work, and what makes it superior to other maintenance tasks in SQL? In this blog, we’ll explore all these questions and uncover the benefits of SQL partitioning and how to automate this. Let’s dive in and discover how partitioning can transform your database management!

Lets start with, what is partitioning…?

Partitioning in SQL Server is a powerful technique that helps manage and optimize large databases. Think of it like organizing a messy room into different sections—each section holds a specific type of item, making it easier to find what you need. In a database, partitioning breaks down large tables into smaller, more manageable pieces based on specific criteria, such as date ranges. This makes your queries run faster and simplifies maintenance tasks.

Why partitioning and how is it better than other maintenance task…?

Partitioning offers several advantages over other maintenance tasks in SQL Server, making it a powerful tool for managing large datasets efficiently. Here’s how partitioning stands out

Improved Query Performance

Efficient Data Access: By splitting a large table into smaller, more manageable pieces (partitions), queries that access only a subset of the data can run faster. The database engine can quickly locate and retrieve data from the relevant partitions without scanning the entire table.

Parallel Processing: Partitioning allows for parallel processing of queries. Different partitions can be processed simultaneously by multiple processors, further speeding up query execution.

Enhanced Manageability

Easier Maintenance: Tasks like index rebuilds, backups, and restores can be performed on individual partitions instead of the entire table. This makes maintenance operations faster and less resource-intensive.

Data Archiving: Partitioning makes it easier to archive old data. You can move entire partitions (representing older data) to cheaper storage or even drop them if they are no longer needed, without affecting the rest of the data.

Scalability

Handling Large Data Volumes: Partitioning enables databases to handle larger data volumes more efficiently. By distributing data across partitions, the system can manage and query large datasets without degrading performance.

Partition Switching: SQL Server supports partition switching, which allows you to quickly move data in and out of partitions. This is useful for data loading and unloading operations.

Better Data Management

Partition Elimination: When a query specifies a condition on the partitioning column, SQL Server can eliminate partitions that do not contain relevant data, reducing the amount of data scanned and improving performance.

Simplified Data Purging: Dropping an entire partition is much faster than deleting rows from a large table, making data purging operations more efficient.

Improved Resource Utilization

Efficient Use of Resources: Partitioning helps in better utilization of system resources by spreading the load across multiple disks and processors, preventing any single resource from becoming a bottleneck.

Reduced Lock Contention: Operations on different partitions can reduce lock contention, as they can be managed independently, leading to better concurrency and overall system performance.

Before diving into more detail, it’s important to understand the two main types of partitioning in SQL Server: vertical and horizontal. Vertical partitioning splits a table into smaller tables with fewer columns, while horizontal partitioning divides a table into smaller pieces based on rows.

Vertical Partitioning

Vertical partitioning is mostly used in the cases where the table have a column with more wide text or BLOB columns, in this case to reduce the access time to these blob columns the table can be split, also to avoid sensitive columns such as password, phone numbers, Vertical partitioning splits a table into two with a common column between the two split tables.

Now that we have a basic understanding of vertical partitioning and since this blog focus exclusively on horizontal partitioning, let’s dive deep into horizontal partitioning and explore how it can optimize your database performance and maintenance.

Horizontal Partitioning

Horizontal partitioning helps In dividing a big table into multiple smaller chunks by creating logical partitioning, suppose we have a big table with data of 1 year where each month 50Cr records will be inserted, whenever we fire a select query for data of certain month all the records in the table would be scanned and filtered for a specific set of records to avoid this we can create partition for every month where records of that particular month will be placed in that respective partition so whenever data of particular month is needed only that partition will be internally scanned and the required result set will be the output.

Partitioning column is usually a datetime column but all data types that are valid for use as index columns can be used as a partitioning column, except a timestamp column. The ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max).

There are two different approaches we could use for table partitioning. The first is to create a new partitioned table and then simply copy the data from your existing table into the new table and do a table rename. The second approach is to partition an existing table by rebuilding or creating a clustered index on the table.

Example of horizontal partitioning by rebuilding or creating index on the table

In the below example we are creating partitioning on a table with a boundary range of 1 month meaning the partition will be created on monthly basis, having said that partitioning can be created with different strategies such as monthly, quarterly, half yearly or yearly based on the data size and the business requirements.

Steps used in Horizontal Partitioning:

  • Create Filegroups
  • Create Datafiles
  • Create Partition Function
  • Create Partition Scheme
  • Create Index on the Key column

Every database will have a primary filegroup where primary data file will be stored. An additional secondary file groups can be created to store secondary data files, we will create 12 filegroups for 12 different months. The reason for having different file groups is separation of hot, cold data which is part of archival, different file groups can be backed up and restored separately.

ALTER DATABASE uds_test

ADD FILEGROUP [fg_before_2024];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_jan];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_feb];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_mar];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_apr];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_may];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_jun];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_jul];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_aug];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_sep];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_oct];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_nov];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_2024_dec];

 

ALTER DATABASE uds_test

ADD FILEGROUP [fg_after_2024];

We are creating two additional partition one for data before the specified range and one for data that will be outside the specified range for example if the partition is for all the months of 2024, once the data of 2025 is inserted into the table it will be sitting in the additional partition which we are creating.

ALTER DATABASE [uds_test]

ADD FILE

(

NAME = N’SalesBefore2024′,

FILENAME = N’C:SQLDataSalesBefore2024.ndf’,

SIZE = 5MB, MAXSIZE = UNLIMITED, FILEGROWTH = 5MB

) TO FILEGROUP [fg_before_2024];

We add secondary data files to the filegroups where data can be placed, the same way  we need to add data files to all the  filegroups.

After creating additional filegroups for storing data, we create a partition function, a partition function that maps the row of a partitioned table into partition based on the partition column, In this example we will create a partitioning function that partitions a table into 12 partitions, one for each month of a year’s worth of values in a datetime column:

CREATE PARTITION FUNCTION pf_SalesDate (DATETIME)

AS RANGE RIGHT FOR VALUES

(

‘2024-01-01’, ‘2024-02-01’, ‘2024-03-01’, ‘2024-04-01’, ‘2024-05-01’,

‘2024-06-01’, ‘2024-07-01’, ‘2024-08-01’, ‘2024-09-01’, ‘2024-10-01’,

‘2024-11-01’, ‘2024-12-01’, ‘2025-01-01’

);

There are two ways using which partition functions can be created:

  • Range Right: This means that the boundary value itself is included in the partition to the right of the boundary, in the above partition function script the boundary values are set accordingly, here the partition 1 will contain rows of data less than 2024-01-01, Partition 2 will contain rows with dates from ‘2024-01-01’ to ‘2024-01-31’ (inclusive) and so on. Range right is the best used option because the data for the specified date range will be available in that particular partition.

 

  • Range left: In this method, each boundary value is included in the partition to the left of the boundary, if we consider the same example as above then the data distribution will be like, Partition 1 Contains rows with dates less than or equal to ‘2024-01-01’, Partition 2 Contains rows with dates from ‘2024-01-02’ to ‘2024-02-01’ (inclusive) and so on.

To map the partitions of a partitioned table to filegroups and determine the number and domain of the partitions of a partitioned table we will create a partition scheme.

CREATE PARTITION SCHEME ps_SalesDate

AS PARTITION pf_SalesDate TO

(

[fg_before_2024], [fg_2024_jan], [fg_2024_feb], [fg_2024_mar],

[fg_2024_apr], [fg_2024_may], [fg_2024_jun], [fg_2024_jul],

[fg_2024_aug], [fg_2024_sep], [fg_2024_oct], [fg_2024_nov],

[fg_2024_dec], [fg_after_2024]

);

Will create index on the key column based on which partitioning is performed and the index will be mapped to the partition schema that was created earlier.

create index ix_dob on p(dob) on ps_SalesDate(DOB)

Below is the DMV used to verify the rows in different partitions

SELECT DISTINCT o.name as table_name, rv.value as partition_range, fg.name as file_groupName, p.partition_number, p.rows as number_of_rows

FROM sys.partitions p

INNER JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id

INNER JOIN sys.objects o ON p.object_id = o.object_id

INNER JOIN sys.system_internals_allocation_units au ON p.partition_id = au.container_id INNER JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id  INNER JOIN sys.partition_functions f ON f.function_id = ps.function_id INNER JOIN sys.destination_data_spaces dds ON dds.partition_scheme_id = ps.data_space_id AND dds.destination_id = p.partition_number INNER JOIN sys.filegroups fg ON dds.data_space_id = fg.data_space_id  LEFT OUTER JOIN sys.partition_range_values rv ON f.function_id = rv.function_id AND p.partition_number = rv.boundary_id

WHERE o.object_id = OBJECT_ID(‘<Tablename>’);

Below is the details of the partitions along with the rows per partition, it also has range, table name and filegroup details of all the partitions, the partition with range specified as null is the partition at the last that can accommodate all the data from 2025.

Now that we have successfully implemented partitioning on our table, you must be getting a big question in your mind what if the data out of all these boundary ranges start coming in, imagine having to manually adjust partitions every time new data comes in or the database structure changes. Not only does this require significant effort, but it also increases the risk of errors and inconsistencies.

Fortunately we have come up with a solution to this overhead that is automation. By automating the partitioning process, you can effortlessly manage your data partitions, ensuring optimal performance and maintenance with minimal manual intervention. Automation not only simplifies the task but also makes your database more efficient and reliable.

SQL Server Agent jobs, or third-party scheduling tools can be used by database administrators to set up routines to periodically check for new data ranges and create corresponding partitions.

Following is the stored procedure used to create partitions on monthly basis, this stored procedure will check for the maximum boundary range specified and will create a new partition of the next range, for example if the boundary range of previous partition was May this script on execution will create a new partition for June month, this will reduce the manual intervention of creating partition on monthly basis as the new data comes in, this partitions are created by creating a new filegroup for that particular partition.

IF EXISTS (SELECT Name FROM sys.procedures WHERE name = ‘CreateNextPartition’)

DROP PROCEDURE CreateNextPartition;

GO

 

CREATE PROCEDURE dbo.CreateNextPartition

AS

BEGIN

SET NOCOUNT ON;

SET XACT_ABORT ON;

 

DECLARE @NextBoundary datetime;

DECLARE @DtPreviousBoundary datetime;

DECLARE @strFileGroupToBeUsed VARCHAR(100);

DECLARE @PartitionNumber int;

DECLARE @SQL NVARCHAR(MAX);

DECLARE @FileGroupName NVARCHAR(MAX);

DECLARE @FileName NVARCHAR(MAX);

 

— Find the previous boundary and corresponding filegroup

SELECT TOP 1

@strFileGroupToBeUsed = fg.name,

@PartitionNumber = p.partition_number,

@DtPreviousBoundary = CAST(prv.value AS datetime)

FROM sys.partitions p

INNER JOIN sys.objects tab ON tab.object_id = p.object_id

INNER JOIN sys.allocation_units au ON au.container_id = p.hobt_id

INNER JOIN sys.filegroups fg ON fg.data_space_id = au.data_space_id

INNER JOIN sys.partition_range_values prv ON prv.boundary_id = p.partition_number

INNER JOIN sys.partition_functions PF ON pf.function_id = prv.function_id

WHERE pf.name = ‘pf_daterange’  — Replace with your actual partition function name

AND tab.name = ‘autom’      — Replace with your actual table name

ORDER BY prv.value DESC; — Get the most recent boundary

 

— Calculate the next boundary based on previous boundary

SELECT @NextBoundary = DATEADD(MONTH, 1, @DtPreviousBoundary);

 

— Generate filegroup and file names based on the partition boundary

SET @FileGroupName = ‘FileGroup_’ + FORMAT(@NextBoundary, ‘yyyyMM’);

SET @FileName = ‘File_’ + FORMAT(@NextBoundary, ‘yyyyMM’);

 

— Print for verification

PRINT ‘Next Boundary: ‘ + CONVERT(nvarchar(30), @NextBoundary, 120);

PRINT ‘Previous Boundary: ‘ + CONVERT(nvarchar(30), @DtPreviousBoundary, 120);

PRINT ‘FileGroup Name: ‘ + @FileGroupName;

PRINT ‘File Name: ‘ + @FileName;

— Check if the filegroup exists, if not, create it

IF NOT EXISTS (SELECT 1 FROM sys.filegroups WHERE name = @FileGroupName)

BEGIN

SET @SQL = ‘ALTER DATABASE ad2019 ADD FILEGROUP ‘ + QUOTENAME(@FileGroupName);

EXEC sp_executesql @SQL;

END

 

— Check if the file exists, if not, create it

IF NOT EXISTS (SELECT 1 FROM sys.database_files WHERE name = @FileName)

BEGIN

SET @SQL = ‘ALTER DATABASE ad2019 ADD FILE

(NAME = ‘ + QUOTENAME(@FileName) + ‘,

FILENAME = ”C:SQLData’ + @FileName + ‘.ndf”,

SIZE = 5MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10MB)

TO FILEGROUP ‘ + QUOTENAME(@FileGroupName);

EXEC sp_executesql @SQL;

 

END

 

— Add the filegroup as next used in the partition scheme

SET @SQL = ‘ALTER PARTITION SCHEME ps_daterange NEXT USED ‘ + QUOTENAME(@FileGroupName);

EXEC sp_executesql @SQL;

 

— Ensure the new boundary does not already exist in the partition function

IF NOT EXISTS (

SELECT 1 FROM sys.partition_range_values prv

INNER JOIN sys.partition_functions PF ON pf.function_id = prv.function_id

WHERE pf.name = ‘pf_daterange’

AND CAST(prv.value AS datetime) = @NextBoundary

)

BEGIN

— Split partition function for next month

SET @SQL = ‘ALTER PARTITION FUNCTION pf_daterange() SPLIT RANGE (”’ + CONVERT(nvarchar(30), @NextBoundary, 120) + ”’)’;

EXEC sp_executesql @SQL;

END

END

GO

Upon executing the above stored procedure using the following command a new partition along with the new filegroup will be created.

Exec CreatePartition– execution command.

Below is the result of the stored procedure with all the details

Conclusion

Partitioning is a powerful tool for managing large datasets and improving query performance. By automating partition creation and maintenance, you streamline processes, minimize errors, and ensure your database remains efficient as it grows. Embracing automation not only simplifies your workload but also enhances the overall performance and reliability of your SQL Server environment.

Thank you for reading!

Thank you for diving into the world of SQL Server partitioning with me. Until next time, may your queries be swift and your partitions perfectly aligned!

 


Lochan R

Leave a Reply

Your email address will not be published. Required fields are marked *