Python script to check null values for column in file and validate in database

Blogs

Fetch order details using api with python script and generate id
August 15, 2024
Python Data Validation across Multiple Databases
August 15, 2024

Python script to check null values for column in file and validate in database

This Python script performs data validation and database checks.
It primarily focuses on handling null values in a specific column (col1) of a CSV file and
then cross-checks those values against a database to generate reports.

Detailed Breakdown:

Imports and Initial Setup:

Imports: The script imports the pandas library for data manipulation and pymysql for interacting with a MySQL database.
Directory Path: A directory path (dir) is set where files will be read from and saved to.
Reading the CSV File:

Data Loading: The script reads a CSV file located at /location/filename.csv into a Pandas DataFrame (df).
Null Value Count: It calculates the number of null values in the col1 column using isnull().sum() and prints the result.
Extracting IDs with Null Values:

Filtering: The script filters out the rows where col1 is null and extracts the corresponding col2 values.
Conversion to List: These col2 values are then converted to strings and stored in a list (id). This list will be used for further database queries.
Database Check (db1_check function):

Database Connection: The function connects to a MySQL database named db1 using the provided credentials.
Query 1: The script executes a query to select col1 values from tab1 where col2 matches any value in the id list,
col4 is “credit_note”, and col1 is not null. This query is intended to find non-null col1 values.

Result Handling for Query 1:
The results of this query are converted into a DataFrame (df1).
If any rows are found (i.e., df1 is not empty), the script saves these results to a CSV file named doc_ref.csv.
Query 2: The script then executes a similar query to select col1 values from atb1 where col2 matches any value in
the id list, col4 is “credit_note”, but where col1 is null.

Result Handling for Query 2:
The results of this query are stored in another DataFrame (df2).
If any rows are found (i.e., df2 is not empty), the script saves these results to a CSV file named doc_ref_null.csv.
Main Execution:

The script’s main execution block calls the db1_check function, which triggers the database checks and file exports described above.

Below is the python code to perform the above requirements

import pandas as pd
import pymysql as p

dir=’/location/’

df=pd.read_csv(dir+’filename.csv’)

c=df[‘col1’].isnull().sum()

print(‘Null values in col1 column :’,c)

# c=c.astype(int)

# print(type(c))

d=df[df[‘col1’].isnull()][‘col2′]

id=d.astype(str).to_list()

def db1_check():

print

print(“Checking in db1 for col1….”)

print()

w_con=p.connect(user = “abcd”, passwd = “abcd”, host = “127.0.0.1”, port = 3306, database = “db1″)

cur=w_con.cursor()

q=f’select col1 from tab1 where col2 in {tuple(id)}
and col4=”credit_note” and col1 is not null’

cur.execute(q)

df1=pd.DataFrame(cur.fetchall())

print(“count of ids not having null doc_ref:”,len(df1))

if len(df1) > 0:

print(“The ids saved in file for non null ids doc_ref.csv file…”)
df1.to_csv(dir+’doc_ref.csv’,index=False,header=False)

q1=f’select col1 from atb1 where col2 in {tuple(id)}
and col4=”credit_note” and col2 is null’

cur.execute(q1)

df2=pd.DataFrame(cur.fetchall())

print()

print(“count of ids having null doc_ref:”,len(df2))

if len(df2) > 0:

print(“The ids saved in file for null ids doc_ref_null.csv file…”)
df2.to_csv(dir+’doc_ref_null.csv’,index=False,header=False)

if __name__ == ‘__main__’:

db1_check()


Pooja K

Leave a Reply

Your email address will not be published. Required fields are marked *