Database Interactions with AI-Powered Speech Recognition

Blogs

Querying the MongoDB using python pymongo module
May 8, 2024
Private Link in Fabric: The Key to Enhanced Security and Compliance
May 21, 2024

Database Interactions with AI-Powered Speech Recognition

In today’s fast-paced world, efficiency is key. Imagine a scenario where we interact with the database using nothing but voice/audio—no more typing lengthy queries or navigating complex interfaces. The advancements in artificial intelligence have made this vision a reality. In this blog post, we’ll explore how we leveraged AI technologies to revolutionize database interactions.

Speech recognition and natural language processing, facilitated by tools like Azure Cognitive Services and Open AI, have streamlined the process, enabling users to effortlessly obtain real-time results.

Setting Up Configuration Details for Speech and AI Resources

To set up the subscription key, endpoint, region, and API version for both resources, follow similar steps:

  1. Create Azure Resources: Log in to the Azure Portal and navigate to the Azure services for Speech and AI. Create new resources if they haven’t been created yet.
  2. API Key: Once the resources are created, navigate to the ‘Keys and Endpoint’ section within each resource. Here, you’ll find the API key(s) associated with your services. Copy the key that corresponds to your subscription.
  3. Endpoint: In the same ‘Keys and Endpoint’ section, you’ll also find the endpoint URL for each service. These endpoints are crucial for connecting to the services from your application. Copy the endpoint URLs.
  4. Region: The region in which your resources are hosted is also displayed in the Azure Portal. It’s important to note the region, as it determines the data residency and availability zone for your services.
  5. API Version: Additionally, ensure that you specify the correct API version for your services. This ensures compatibility and access to the latest features.

Technical Specifications:

Python script for Speech-to-Text Conversion:

import azure.cognitiveservices.speech as speechsdk
import tempfile
import os

# Set up Azure Cognitive Services Speech SDK
subscription_key = "YOUR_SUBSCRIPTION_KEY"
region = "YOUR_REGION"
config = speechsdk.SpeechConfig(subscription=subscription_key, region=region)

def speech_to_text():
   # Initialize Speech Recognizer
     speech_recognizer = speechsdk.SpeechRecognizer(speech_config=config)
   # Record audio input from microphone
     print("Speak into your microphone.")
     result = speech_recognizer.recognize_once_async().get()
   # Print and return transcribed text
     transcribed_text = result.text
     print("Transcribed Text:", transcribed_text)
return transcribed_text

Explanation:

In the script, we configured the Azure Cognitive Services Speech SDK to facilitate speech recognition. The speech_to_text() function was set up to initialize a speech recognizer, capturing audio input from the microphone and converting it into text. We require subscription key, endpoint, and region from the Azure Portal under the speech resource section.

Python script for SQL Query Generation :

def generate_sql_query(user_input):
    prompt = f"""Task: Generate only SQL Server Query based on user input : 
            '{user_input}',and replace the values assigned to different forms of database names 
            (e.g., 'your_database_name', your_database_name, [databasename]) with the database name '{db_name}'.
            Please ensure the generated query is accurate, sql syntax and deterministic."""
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": "which is the largest database in server"},
        {"role": "assistant", "content": "SELECT TOP 1 name AS DatabaseName, (size*8)/1024 AS SizeMB FROM sys.master_files GROUP BY name, size ORDER BY SizeMB DESC"},
        {"role": "user", "content": user_input}
    ]
    response = api_parameter(messages)
    if response:
        generated_query  = response['choices'][0]['message']['content']
        generated_query : str = generated_query.replace('n', ' ')
        print('generated query', generated_query[generated_query.find(":") + 1:])
        return (generated_query[generated_query.find(":") + 1:])
    else :
        print("No Response genertaed")
        return ("No Response Generated")

Explanation:

In this part of our project, the generate_sql_query(user_input) function transforms natural language into SQL queries, making database interaction more user-friendly. Through a simulated conversation between the system and user, includes one-short learning method, we guide input and provide default SQL query suggestions. Upon user input, an API processes the language and generates an SQL query accordingly. After extracting and refining the query, it’s returned for execution. This integration of natural language processing simplifies database querying, promoting user understanding and efficiency.

Python script for Database Interaction:

def connection_to_db(user_input,sql_query):
    try:
        connection_string = f"DRIVER={driver};SERVER={server_name};DATABASE={db_name};TRUSTED_CONNECTION={trusted_connection}"
        connection = pyodbc.connect(connection_string)
    except:
        print(f"Oops! {db_name} Database not available in the Server and unable to connect...")
        return render_template('index.html', error=f"Oops! {db_name} Database not available in the Server and unable to connect...")
    if not sql_query:  
        print  ( "Failed to generate sql query")
        return render_template('index.html', error = "Failed to generate sql query")
    print('connecting to database')
    columns, result, status= execute_query(user_input, sql_query, connection)
    if columns and result is not None and status == 'SUCCESS':
        formatted_result = [{'ID': idx, **{columns[i]: str(value) for i, value in enumerate(row)}} for idx, row in enumerate(result, start=1)]
        print (user_input)
        print(sql_query)
        print(formatted_result)
        result = text_to_speech(formatted_result)
    else:
        print(user_input)
        print(sql_query)

Explanation:

The connection_to_db(user_input, sql_query) function establishes a database connection and executes the user-provided SQL query. Constructing a connection string with server and database details, it handles errors like database unavailability. Upon successful query generation, it connects to the database. Executing the SQL query, it formats retrieved data into a dictionary for readability. The formatted result is used for speech output via text_to_speech(). Additionally, both text_to_speech() and execute_query() functions are utilized to execute the query and convert the result into speech.

Python script for Execute generated query connecting to DB:

def execute_query(user_input, sql_query, connection):
    try:
        cursor = connection.cursor()
        cursor.execute(sql_query)
        columns = [column[0] for column in cursor.description]
        result = cursor.fetchall()
        if not result:
            print ("executed, but no records to display")
            return None, None,("Executed, but no records to display")
        try:
            if len(result)==1:
                print (columns, result)
                return columns, result, 'SUCCESS'
            elif len(result)>1 and len(result)<15:
                rows = [list(row) for row in result]
                print(columns ,rows)
                return columns, rows, 'SUCCESS'
            else:
                print(len(result))
                return None, None, (f"There are {len(result)} rows, Which is huge data. Please be specific with data that is required")
        except Exception as e:
            print ("Please provide necessary credentials along with your input for an accurate result.", str(e))
            return None, None, ("Please provide necessary credentials along with your input for an accurate result.", str(e))
    except pyodbc.Error as e:
        print("sorry, syntax error",str(e))
        return  None, None, ("sorry, syntax error",str(e))

Explanation:

The execute_query(user_input, sql_query, connection) function handles executing SQL queries and processing the results. It tries to connect to the database and run the query provided. Then, it checks if any data is returned and formats data. If there’s huge data, it suggests being more specific in the query. It also pop up with error messages if any query mistake is encountered. Overall, it helps make sure queries run smoothly and provides clear feedback if there are any issues.

Python script for Text-to-Speech Conversion:

def text_to_speech( result):
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    config.speech_synthesis_voice_name='en-US-JennyNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=config, audio_config=audio_config)
    text = f"{result}"
    print("Text to speech")
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
    print(speech_synthesis_result)

Explanation:

The text_to_speech(result) function converts text into audio using a speech synthesizer. It sets up the audio configuration and specifies the voice for synthesis. Then, it initializes the synthesizer with the settings and the provide text. When the function is invoked, it asynchronously speaks the text and retrieves the result. Overall, this function enables the transformation of textual information into spoken output, enhancing accessibility and user interaction.

Result:

Output 1

Output 2

Conclusion:

In summary, the utilization of AI-powered speech recognition has redefined database interactions, enabling effective querying through natural language. By integrating Azure Cognitive Services and OpenAI, users can efficiently interact with databases, which eliminates the manual query formulation. The technical steps provides implementation of speech-to-text conversion, SQL query generation, and text-to-speech conversion, ensuring a user-friendly experience.


Gajalakshmi N

Leave a Reply

Your email address will not be published. Required fields are marked *