This Python script performs data validation and database checks.
It primarily focuses on handling null values in a specific column (col1) of a CSV file and
then cross-checks those values against a database to generate reports.
Detailed Breakdown:
Imports and Initial Setup:
Imports: The script imports the pandas library for data manipulation and pymysql for interacting with a MySQL database.
Directory Path: A directory path (dir) is set where files will be read from and saved to.
Reading the CSV File:
Data Loading: The script reads a CSV file located at /location/filename.csv into a Pandas DataFrame (df).
Null Value Count: It calculates the number of null values in the col1 column using isnull().sum() and prints the result.
Extracting IDs with Null Values:
Filtering: The script filters out the rows where col1 is null and extracts the corresponding col2 values.
Conversion to List: These col2 values are then converted to strings and stored in a list (id). This list will be used for further database queries.
Database Check (db1_check function):
Database Connection: The function connects to a MySQL database named db1 using the provided credentials.
Query 1: The script executes a query to select col1 values from tab1 where col2 matches any value in the id list,
col4 is “credit_note”, and col1 is not null. This query is intended to find non-null col1 values.
Result Handling for Query 1:
The results of this query are converted into a DataFrame (df1).
If any rows are found (i.e., df1 is not empty), the script saves these results to a CSV file named doc_ref.csv.
Query 2: The script then executes a similar query to select col1 values from atb1 where col2 matches any value in
the id list, col4 is “credit_note”, but where col1 is null.
Result Handling for Query 2:
The results of this query are stored in another DataFrame (df2).
If any rows are found (i.e., df2 is not empty), the script saves these results to a CSV file named doc_ref_null.csv.
Main Execution:
The script’s main execution block calls the db1_check function, which triggers the database checks and file exports described above.
Below is the python code to perform the above requirements
import pandas as pd
import pymysql as p
dir=’/location/’
df=pd.read_csv(dir+’filename.csv’)
c=df[‘col1’].isnull().sum()
print(‘Null values in col1 column :’,c)
# c=c.astype(int)
# print(type(c))
d=df[df[‘col1’].isnull()][‘col2′]
id=d.astype(str).to_list()
def db1_check():
print(“Checking in db1 for col1….”)
print()
w_con=p.connect(user = “abcd”, passwd = “abcd”, host = “127.0.0.1”, port = 3306, database = “db1″)
cur=w_con.cursor()
q=f’select col1 from tab1 where col2 in {tuple(id)}
and col4=”credit_note” and col1 is not null’
cur.execute(q)
df1=pd.DataFrame(cur.fetchall())
print(“count of ids not having null doc_ref:”,len(df1))
if len(df1) > 0:
print(“The ids saved in file for non null ids doc_ref.csv file…”)
df1.to_csv(dir+’doc_ref.csv’,index=False,header=False)
q1=f’select col1 from atb1 where col2 in {tuple(id)}
and col4=”credit_note” and col2 is null’
cur.execute(q1)
df2=pd.DataFrame(cur.fetchall())
print()
print(“count of ids having null doc_ref:”,len(df2))
if len(df2) > 0:
print(“The ids saved in file for null ids doc_ref_null.csv file…”)
df2.to_csv(dir+’doc_ref_null.csv’,index=False,header=False)
if __name__ == ‘__main__’:
db1_check()
Pooja K