Automating Restoration of Transaction Log Backups in SQL Server Using Python

Blogs

Unlocking MAXDOP of SQL Server: Tuning Query Performance for Complex Workloads
December 17, 2023
A Step-by-Step Guide to Update Patch on SQL Server Always on Availability Group Cluster
December 27, 2023

Automating Restoration of Transaction Log Backups in SQL Server Using Python

Automating Restoration of Transaction Log Backups in SQL Server Using Python

Introduction:

In the world of database management, ensuring the integrity and consistency of data is crucial. One key aspect of this is regularly backing up transaction logs, especially in scenarios where data is frequently updated. In this technical blog, we’ll explore how to automate the process of applying transaction log backups to a SQL Server using Python. The provided Python script utilizes the pyodbc library to connect to SQL Server instances, pandas for data manipulation, and subprocess for executing SQL commands.

Scenario:

The client have a 4 node Always on Availability group cluster, out of this 4 nodes 1 node is in synchronous commit mode and other 2 nodes are in asynchronous commit mode. Copy only backup is generated using secondary node, there is one full backup every 24 hours and transactional log backup every 15 seconds.

Challenge:

To add any new node using backup the LSN number of primary should match with LSN number of new node, for that the backups should be restored in same manner in which it was generated. The main challenge here is to restore all the transactional log backup in a automated way.

Approach:

  • First take a full backup from primary node and start restoring in standalone node.
  • The time in which the primary backup got restored, the automated backup job would generate n numbers of transactional log backup files.
  • Once we start restoring n numbers of transactional log backup files, it will generate m numbers of new transactional log files.
  • So, to deal with this cycle of new transactional log files, will use a script which will detect all new files and start restoring it to new node and the script will be running in while loop asking for users input every time before proceeding to restoring step.
  • The full backup restoration is done manually and then on top of it will execute the below script.

Prerequisites:

  • Python installed on your machine.
  • Required Python libraries: pyodbc, pandas, numpy, pathlib,subprocess,shutil.
  • SQL Server instances for source (secondary server) and destination (new node).
  • Connection to both source and destination servers.

Python Script Overview:

The provided Python script is divided into several sections:

1. Connecting to Source and Destination Servers:

The script establishes connections to both the source and destination SQL Server instances using pyodbc. Ensure that the connection details (server name, database name, username, and password) are correctly configured for your environment.

2. Functions for Getting Last LSN and File List:

Two functions, get_lsn and get_file_list, are defined to retrieve the last Log Sequence Number (LSN) applied to the destination server and to get a list of transaction log backup files, respectively. The LSN is crucial for filtering and applying the correct transaction log backups.

3. Function for Restoring Log Backups:

The restore_backup function iterates through the transaction log backup files, executing SQL Server commands to restore each file to the destination database.

4. Execution Loop:

The script includes a loop that continually checks for new transaction log backups, restores them, and prompts the user whether to continue running the script.

Note: Before implementing the script, ensure you have the necessary permissions and understand the implications of automating transaction log backups.

Python code:

##----------------------------------------------------------------------------------------------
import pyodbc
import pandas as pd
import numpy as np
from pathlib import Path
import os
import subprocess
import shutil
##----------------------------------------------------------------------------------------------
## There will be two servers first from where we will be referring the file and second server where we will be restoring the files.
## Connection string to the source server
server1 = 'servername'                   # Replace with 127.0.0.1,21443 for production and start tunneling for the same 
database1 = 'master'                    # For restoration master database will work
username1 = 'test'                      # Specify the user which has appropriate access
password1 = 'password1###'          

# Establishing a connection with database
cnxn1 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server1+';DATABASE='+database1+';UID='+username1+';PWD='+ password1)
cursor1 = cnxn1.cursor()

## Checking connection to source
query='select @@servername'
df=pd.read_sql_query(query,cnxn1)
print(df)

##----------------------------------------------------------------------------------------------
## Connection string to the destination server
server2 = 'servername'                   # For destination server, we can give server name no need to do tunneling as it is local server.
database2 = 'master' 
username2 = 'sa' 
password2 = 'password2####'

# Establishing a connection with database
cnxn2 = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server2+';DATABASE='+database2+';UID='+username2+';PWD='+ password2)
cursor2 = cnxn2.cursor()

## Checking connection to source
query='select @@servername'
df=pd.read_sql_query(query,cnxn2)
print(df)

##----------------------------------------------------------------------------------------------
## Function to get last LSN which has been applied to destination server
def get_lsn(db_name):
    get_lsn=('''
    with cte as 
    (SELECT
    s.server_name,d.name AS database_name,m.physical_device_name,CASE m.device_type WHEN 2 THEN 'Disk'
    WHEN 102 THEN 'Backup Device (Disk)' WHEN 5 THEN 'Tape' WHEN 105 THEN 'Backup Device (Tape)'
    WHEN 7 THEN 'Virtual Device' END AS device_type
    ,CAST (s.backup_size / 1048576.0 AS FLOAT) AS backup_size_mb
    ,CAST (s.compressed_backup_size / 1048576.0 AS FLOAT) AS compressed_backup_size_mb
    ,s.backup_start_date,s.first_lsn,s.backup_finish_date,s.database_backup_lsn
    ,s.last_lsn,s.checkpoint_lsn
    ,CASE s.[type] WHEN 'D' THEN 'Database (Full)' WHEN 'I' THEN 'Database (Differential)'
    WHEN 'L' THEN 'Transaction Log' WHEN 'F' THEN 'File or Filegroup (Full)'
    WHEN 'G' THEN 'File or Filegroup (DIfferential)'
    WHEN 'P' THEN 'Partial (Full)' WHEN 'Q' THEN 'Partial (Differential)' END AS backup_type
    ,s.recovery_model,ROW_NUMBER () OVER (PARTITION BY s.database_name, s.database_backup_lsn 
    ORDER BY s.backup_start_date) AS Row
    FROM msdb.dbo.backupset s 
    INNER JOIN 
    (SELECT database_name, MAX(backup_set_id) AS max_backup_set_id FROM msdb.dbo.backupset WHERE [type] = 'D'
    GROUP BY database_name
    ) t ON s.database_name = t.database_name and s.backup_set_id >= t.max_backup_set_id
    INNER JOIN msdb.dbo.backupmediafamily m ON s.media_set_id = m.media_set_id
    RIGHT OUTER JOIN sys.databases d ON s.database_name = d.name AND s.recovery_model = d.recovery_model_desc COLLATE SQL_Latin1_General_CP1_CI_AS
    )
    select max(first_lsn) from cte where database_name='{}'
    '''.format(db_name))
    #print(subquery)
    lsn=cursor2.execute(get_lsn)
    val=(lsn.fetchone())[0]
    return(val)

##----------------------------------------------------------------------------------------------
## Function to get list of files in dataframe
def get_file_list(lsn):
    query='''
    with cte as
    (SELECT bs.database_name, bs.first_lsn, bs.last_lsn ,
    database_backup_lsn, -- For tlog and differential backups, this is the checkpoint_lsn of the FULL backup it is based on.
    checkpoint_lsn,
    backuptype = CASE 
    WHEN bs.type = 'D' AND bs.is_copy_only = 0 THEN 'Full Database'
    WHEN bs.type = 'D' AND bs.is_copy_only = 1 THEN 'Full Copy-Only Database'
    WHEN bs.type = 'I' THEN 'Differential database backup'
    WHEN bs.type = 'L' THEN 'Transaction Log'
    WHEN bs.type = 'F' THEN 'File or filegroup'
    WHEN bs.type = 'G' THEN 'Differential file'
    WHEN bs.type = 'P' THEN 'Partial'
    WHEN bs.type = 'Q' THEN 'Differential partial'
    END + ' Backup',
    CASE bf.device_type
    WHEN 2 THEN 'Disk'
    WHEN 5 THEN 'Tape'
    WHEN 7 THEN 'Virtual device'
    WHEN 9 THEN 'Azure Storage'
    WHEN 105 THEN 'A permanent backup device'
    ELSE 'Other Device'
    END AS DeviceType,
    bms.software_name AS backup_software,
    bs.recovery_model,
    bs.compatibility_level,
    BackupStartDate = bs.Backup_Start_Date,
    BackupFinishDate = bs.Backup_Finish_Date,
    LatestBackupLocation = bf.physical_device_name,
    backup_size_mb = CONVERT(DECIMAL(10, 2), bs.backup_size / 1024. / 1024.),
    compressed_backup_size_mb = CONVERT(DECIMAL(10, 2), bs.compressed_backup_size / 1024. / 1024.),
    begins_log_chain,
    bms.is_password_protected
    FROM msdb.dbo.backupset bs
    LEFT JOIN msdb.dbo.backupmediafamily bf
    ON bs.[media_set_id] = bf.[media_set_id]
    INNER JOIN msdb.dbo.backupmediaset bms
    ON bs.[media_set_id] = bms.[media_set_id]
    )
    select database_name,first_lsn,last_lsn,database_backup_lsn,backuptype,BackupStartDate,BackupFinishDate,LatestBackupLocation 
    from cte where 
    first_lsn>{} and 
    backuptype='Transaction Log Backup'
    and database_name='sample_bkp'
    '''.format(lsn)
    df1=pd.read_sql_query(query,cnxn1)
    return(df1)

##----------------------------------------------------------------------------------------------
def restore_backup(server2,username2,password2,database2):
    b=[]
    for i in df.LatestBackupLocation[:5]:
        bak='''RESTORE LOG sample_bkp FROM DISK = '{}' WITH NORECOVERY'''.format(i)
        print(bak)
        aa=subprocess.check_output('''sqlcmd -S {} -U {} -P {} -d {} -Q "{}"'''.format(server2,username2,password2,database2,bak))
        b.append(aa)
    return(b)

##----------------------------------------------------------------------------------------------
## Execution
iteration=0
a=1
while a==1:
    lsn=get_lsn('sample_bkp')              #get_lsn function will fetch lsn number which will be used as filter for next step
    print(lsn)

    df=get_file_list(lsn)                  #get_file_list function will get list of all directory list in dataframe
    #df.head()
    print(df.shape)

    result=restore_backup(server2,username2,password2,database2)     #restore_backup function restores log backup to destination database
    print(result)

    lsn1=get_lsn('sample_bkp')              #get_lsn function will fetch lsn number which will be used as filter for next step
    print(lsn1)
    df1=get_file_list(lsn1)                  #get_file_list function will get list of all directory list in dataframe
    #df.head()
    print(df1.shape)
    new_rows=df1.shape[0]
    print("We got {} new transaction log backup files".format(new_rows))
    a=int(input("Will you continue running the script, if yes enter 1 else press any other key: "))
    iteration+=1                                         # Default is 0 and with every iteration it will

    if a==1:
        print('''
        
        This will be the {} iteration
        
        '''.format(iteration))
        continue
    else:
        print('Exiting the script....................')
        break    

##----------------------------------------------------------------------------------------------
## END
##----------------------------------------------------------------------------------------------


Naveen Kumar Singh

Leave a Reply

Your email address will not be published. Required fields are marked *