ControlUi Documentation

ControlUi is a powerful, cross-platform open-source application that integrates various AI models, speech recognition, and web scraping capabilities. It provides a seamless interface for voice and text interactions, file attachments, property lookups, and web searches.

Preliminary Information

Before diving into the setup and usage of the AI Assistant, it's important to understand its core components and dependencies.

Modes of Operation

The AI Assistant operates in two primary modes:

  • Voice Mode: Allows users to interact with the AI using voice commands and receive spoken responses.
  • Chat Mode: Provides a text-based interface for typing queries and receiving written responses.

Key Dependencies

The AI Assistant relies on two critical dependencies that must be installed and loaded into the global scope before the program can run:

  • Whisper: An automatic speech recognition (ASR) system used for transcribing voice input.
  • Kokoro: A text-to-speech engine used for generating spoken responses in Voice Mode.

Warning:

The Whisper and Kokoro models are loaded into the global scope. They must be installed and properly configured before running the AI Assistant. Failure to do so will result in runtime errors.

Setup

Installation

  1. 1. Install Anaconda/Conda

    https://www.anaconda.com/download/success

    Allows for easier environment management and python setup. Install system-wide and add to path. Recommended options.

  2. 2. Create new Conda environment

    Run conda -h in your terminal to see if you have conda installed properly

    Open Command Prompt

    Let's create new Conda environment, called cuda with Python version 3.11

    conda create -n cuda python==3.11

    This will create a new conda environment called cuda, and python and any libraries you need for this program will be under this environment.

    To access the environment, open a comment prompt and type:

    conda activate cuda

    You'll now see your environment name at the left of your terminal path. Now we can install libraries required.

  3. 3. Install CUDA toolkit (for Kokoro & Whisper)

    These are not required for chat-based functionality, but for Voice-mode, without a NVIDIA GPU your voice transcription and voice generation will be slower. With a 2080ti it's basically real-time whisper(base) transcription and 10-20x real-time Kokoro voice generation (very fast). With CPU-only you can expect some larger delays transcribing and slight delays generating, but by all means it will still work.

    Install cudatoolkit v11.8.0 - https://anaconda.org/conda-forge/cudatoolkit

    conda install -c conda-forge cudatoolkit
  4. 4. Install cuDNN

    Not required for chat-based functionality

    Install cudnn v8.9.7 - https://anaconda.org/conda-forge/cudnn

    conda install -c conda-forge cudnn
  5. 5. Install Pytorch

    Not required for chat-based functionality

    Install Pytorch - https://pytorch.org/

    conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  6. 6. Install Tensorflow

    Not required for chat-based functionality

    Install Tensorflow 2.14.0, as this is the last Tensorflow compatible version with CUDA 11.8. Reference: https://www.tensorflow.org/install/source#gpu

    conda install -c conda-forge tensorflow=2.14.0=cuda118py311heb1bdc4_0
  7. 7. Other libraries

    You should try running python controlui.py and see if you get import errors

    If you have any missing libraries, install them with pip:

    pip install kokoro or pip install pyperclip or pip install keyboard etc

    With all the imported libraries installed, you should be able to run the program.

  8. 8. Start the Program

    With your command prompt loaded with your proper conda environment that has everything installed, navigate to the proper directory where the .py file is. If you enter dir or ls you should be able to print the contents of your current working directory and see the file .py is present.

    Run python controlui.py

    Once you see Ready!... you can press Ctrl+k to bring up the ControlUi 😎

Configuration

Configure ControlUi by editing the .voiceconfig file in the root directory. Here are the key settings:

{
  "use_sonos": false,
  "use_conversation_history": true,
  "BROWSER_TYPE": "chrome",
  "CHROME_USER_DATA": "C:\Users\PC\AppData\Local\Google\Chrome\User Data",
  "CHROME_DRIVER_PATH": "C:\Users\PC\Downloads\chromedriver.exe",
  "CHROME_PROFILE": "Profile 10",
  "ENGINE": "OpenAI",
  "MODEL_ENGINE": "gpt-4o",
  "OPENAI_API_KEY": "your-api-key-here",
  "GOOGLE_API_KEY": "your-google-api-key-here",
  "days_back_to_load": 15,
  "HOTKEY_LAUNCH": "ctrl+k"
}

Adjust these settings according to your preferences and API keys.

Core Components

Speech Recognition

The AI Assistant uses the Whisper model for speech recognition. Here's how it's implemented:

import whisper as openai_whisper

whisper_model = openai_whisper.load_model("base", device='cuda')

def record_and_transcribe_once() -> str:
    # ... recording logic ...
    
    def transcribe_audio(audio_data, samplerate):
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
            temp_wav_name = tmp.name
        sf.write(temp_wav_name, audio_data, samplerate)
        result = whisper_model.transcribe(temp_wav_name, fp16=False)
        return result["text"]
    
    # ... more recording and transcription logic ...

AI Models Integration

The application supports multiple AI models, including OpenAI, Google, Ollama, Claude, Groq, and OpenRouter. Here's an example of how the OpenAI model is integrated:

def call_openai(prompt: str, model_name: str, reasoning_effort: str) -> str:
    import openai
    import json
    global conversation_messages, OPENAI_API_KEY
    ensure_system_prompt()
    conversation_messages.append({"role": "user", "content": prompt})

    openai.api_key = OPENAI_API_KEY 

    if not openai.api_key:
        stop_spinner()
        print(f"{RED}No OpenAI API key found.{RESET}")
        return ""

    # ... API call logic ...

    try:
        response = openai.chat.completions.create(**api_params)
    except Exception as e:
        print(f"{RED}Error connecting to OpenAI: {e}{RESET}")
        return ""
    
    # ... response handling ...

Web Scraping and External Tools

The AI Assistant includes web scraping capabilities for Google searches and property lookups. Here's an example of the Google search function:

def google_search(query: str) -> str:
    global BROWSER_TYPE
    stop_spinner()
    print(f"{MAGENTA}Google search is: {query}{RESET}")
    encoded_query = quote_plus(query)
    url = f"https://www.google.com/search?q={encoded_query}"
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
        if BROWSER_TYPE == 'chrome':
            context = browser.new_context(
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
            )
        # ... more browser setup ...
        page = context.new_page()
        page.goto(url)
        page.wait_for_load_state("networkidle")
        html = page.content()
        browser.close()
    soup = BeautifulSoup(html, 'html.parser')
    text = soup.get_text()
    cleaned_text = ' '.join(text.split())[0:5000]
    print(cleaned_text)
    return cleaned_text

GUI Implementation

The graphical user interface is implemented using PySide6 (Qt for Python). Here's an example of the main window class:

class BottomBubbleWindow(QWidget):
    global last_chat_geometry
    response_ready = Signal(str, object, object)

    def __init__(self):
        global last_main_geometry, last_chat_geometry        
        super().__init__()
        self.setWindowFlags(Qt.FramelessWindowHint)
        self.setAttribute(Qt.WA_TranslucentBackground, True)
        self.setAttribute(Qt.WA_DeleteOnClose)
        self.response_ready.connect(self.update_ai_reply)

        # Initialize chat dialog with empty content
        self.chat_dialog = ChatDialog(host_window=self)
        if last_chat_geometry:
            self.chat_dialog.setGeometry(last_chat_geometry)
        self.chat_dialog.hide()

        # ... more initialization ...

    def on_message_sent(self, text):
        # ... message handling logic ...

    def process_ai_reply(self, text, container, lb, fresh):
        try:
            ai_reply = call_current_engine(text, fresh=fresh)
        except Exception as e:
            print(f"Error in AI thread: {e}")
            ai_reply = f"[Error: {e}]"
        self.response_ready.emit(ai_reply, container, lb)

    # ... more methods ...

Features

Voice Interaction

The AI Assistant supports voice interactions using the Whisper model for speech recognition and a text-to-speech engine for responses. Here's how voice recording is implemented:

def record_and_transcribe_once() -> str:
    global recording_flag, stop_chat_loop, whisper_model
    model = whisper_model
    if recording_flag:
        return ""
    recording_flag = True
    audio_q.queue.clear()
    samplerate = 24000
    blocksize = 1024
    silence_threshold = 70
    max_silence_seconds = 0.9
    MIN_RECORD_DURATION = 1.0
    recorded_frames = []
    speaking_detected = False
    silence_start_time = None

    with sd.InputStream(channels=1, samplerate=samplerate, blocksize=blocksize, callback=audio_callback):
        print(f"{YELLOW}Recording started. Waiting for speech...{RESET}")
        play_wav_file_blocking("recording_started.wav")
        while True:
            if stop_chat_loop:
                break
            # ... recording logic ...

    if stop_chat_loop:
        recording_flag = False
        return ""
    print(f"{GREEN}Recording ended. Transcribing...{RESET}")
    # ... transcription logic ...
    return text_result

Text Chat

Users can interact with the AI Assistant through text input. The chat interface is implemented in the GUI:

class ChatDialog(QWidget):
    global conversation_messages
    def __init__(self, host_window):
        global conversation_messages
        super().__init__()
        self.host_window = host_window
        self.setWindowFlags(Qt.FramelessWindowHint)
        self.setAttribute(Qt.WA_TranslucentBackground, True)
        self.setAttribute(Qt.WA_DeleteOnClose)

        # ... UI setup ...

        self.reply_line = QLineEdit()
        self.reply_line.setPlaceholderText("Type your reply...")
        reply_layout.addWidget(self.reply_line, stretch=1)
        self.reply_send_button = QToolButton()
        self.reply_send_button.setText("↑")
        self.reply_send_button.setToolTip("Send Reply")
        reply_layout.addWidget(self.reply_send_button)
        self.reply_send_button.clicked.connect(self.handle_reply_send)
        self.reply_line.returnPressed.connect(self.handle_reply_send)

    def handle_reply_send(self):
        text = self.reply_line.text().strip()
        if text:
            self.add_message(text, role="user")
            self.reply_line.clear()
            container, lb = self.add_loading_bubble()
            def do_ai_work():
                try:
                    ai_reply = call_current_engine(text, fresh=False)
                except Exception as e:
                    print("Error in AI thread:", e)
                    ai_reply = f"[Error: {e}]"
                self.host_window.response_ready.emit(ai_reply, container, lb)
            th = threading.Thread(target=do_ai_work, daemon=True)
            th.start()

    # ... more methods ...

File Attachments

The AI Assistant supports file attachments for text-based files. Here's how file handling is implemented:

class FileDropLineEdit(QLineEdit):
    file_attached = Signal(list)  # New signal to notify when a file is attached

    def __init__(self, parent=None):
        super().__init__(parent)
        self.setAcceptDrops(True)
        self.attachments = []  # Will hold dictionaries: {'filename': ..., 'content': ...}

    def dragEnterEvent(self, event):
        if event.mimeData().hasUrls():
            for url in event.mimeData().urls():
                file_path = url.toLocalFile()
                if os.path.splitext(file_path)[1].lower() in ['.txt', '.csv', '.xlsx', '.xls']:
                    event.acceptProposedAction()
                    return
            event.ignore()
        else:
            super().dragEnterEvent(event)

    def dropEvent(self, event):
        if event.mimeData().hasUrls():
            attachments = []
            for url in event.mimeData().urls():
                file_path = url.toLocalFile()
                ext = os.path.splitext(file_path)[1].lower()
                if ext in ['.txt', '.csv', '.xlsx', '.xls']:
                    file_name = os.path.basename(file_path)
                    try:
                        content = read_file_content(file_path)
                        attachments.append({'filename': file_name, 'content': content})
                    except Exception as e:
                        attachments.append({'filename': file_name, 'content': f"Error reading file: {str(e)}"})
            if attachments:
                self.attachments = attachments
                self.file_attached.emit(attachments)
            event.acceptProposedAction()
        else:
            super().dropEvent(event)

Property Lookup

The AI Assistant can fetch property value estimates from Zillow and Redfin. Here's how it's implemented:

def fetch_property_value(address: str) -> str:
    global driver
    # Kill any lingering Chromium instances before starting a new search.
    kill_chromium_instances()
    try:
        driver
    except NameError:
        # ... driver setup ...

    stop_spinner()
    print(f"{MAGENTA}Address for search: {address}{RESET}")
    stop_spinner()

    search_url = "https://www.google.com/search?q=" + address.replace(' ', '+')
    try:
        driver.get(search_url)
        time.sleep(3.5)
    except Exception as e:
        stop_spinner()
        print(f"{RED}[DEBUG] Exception during driver.get: {e}{RESET}")
        stop_spinner()
        return "Error performing Google search."

    # ... search for Zillow and Redfin links ...

    def open_in_new_tab(url):
        # ... open URL in new tab and return page HTML ...

    def parse_redfin_value(source):
        # ... parse Redfin value from HTML ...

    def parse_zillow_value(source):
        # ... parse Zillow value from HTML ...

    property_values = []
    for domain, link in links_found.items():
        if not link:
            continue
        page_html = open_in_new_tab(link)
        extracted_value = None
        if domain == 'Redfin':
            extracted_value = parse_redfin_value(page_html)
        elif domain == 'Zillow':
            extracted_value = parse_zillow_value(page_html)
        if extracted_value:
            property_values.append((domain, extracted_value))

    if not property_values:
        return "Could not retrieve property values."

    result_phrases = []
    for domain, value in property_values:
        result_phrases.append(f"{domain} estimates the home is worth {value}")
    return ", and ".join(result_phrases)

Google Search Integration

The AI Assistant can perform Google searches to provide up-to-date information. Here's how it's implemented:

def google_search(query: str) -> str:
    global BROWSER_TYPE
    stop_spinner()
    print(f"{MAGENTA}Google search is: {query}{RESET}")
    encoded_query = quote_plus(query)
    url = f"https://www.google.com/search?q={encoded_query}"
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
        if BROWSER_TYPE == 'chrome':
            context = browser.new_context(
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
            )
        if BROWSER_TYPE == 'chromium':
            context = browser.new_context(
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
            )
        page = context.new_page()
        page.goto(url)
        page.wait_for_load_state("networkidle")
        html = page.content()
        browser.close()
    soup = BeautifulSoup(html, 'html.parser')
    text = soup.get_text()
    cleaned_text = ' '.join(text.split())[0:5000]
    print(cleaned_text)
    return cleaned_text

Advanced Usage

Custom AI Model Integration

To integrate a custom AI model, you need to add a new function to handle API calls and update the ENGINE_MODELS dictionary. Here's an example:

def call_custom_model(prompt: str, model_name: str) -> str:
    # Implement your custom model API call here
    # Example:
    response = requests.post(
        "https://api.custom-model.com/generate",
        json={"prompt": prompt, "model": model_name}
    )
    return response.json()["generated_text"]

# Add to ENGINE_MODELS
ENGINE_MODELS["CustomAI"] = ["custom-model-1", "custom-model-2"]

# Update call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
    global ENGINE, MODEL_ENGINE
    if ENGINE == "CustomAI":
        return call_custom_model(prompt, MODEL_ENGINE)
    elif ENGINE == "Ollama":
        return call_ollama(prompt, MODEL_ENGINE)
    # ... existing code for other engines ...

Extending Functionality

To add new features or tools to the AI Assistant, you can create new functions and integrate them into the existing workflow. Here's an example of how you might add a weather lookup feature:

import requests

def weather_lookup(city: str) -> str:
    api_key = "your_weather_api_key"
    url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
    response = requests.get(url)
    data = response.json()
    if response.status_code == 200:
        temp = data['main']['temp']
        description = data['weather'][0]['description']
        return f"The weather in {city} is {description} with a temperature of {temp}°C."
    else:
        return f"Unable to fetch weather data for {city}."

# Integrate into call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
    global ENGINE, MODEL_ENGINE, conversation_messages
    
    # Check if the prompt is asking for weather
    if "weather in" in prompt.lower():
        city = prompt.lower().split("weather in")[-1].strip()
        weather_info = weather_lookup(city)
        conversation_messages.append({"role": "assistant", "content": weather_info})
        return weather_info
    
    # Existing engine calls...

API Reference

Key Functions and Classes

Here are some of the key functions and classes in the AI Assistant:

  • record_and_transcribe_once(): Records user speech and transcribes it using the Whisper model.
  • call_current_engine(prompt: str, fresh: bool) -> str: Calls the selected AI model with the given prompt.
  • google_search(query: str) -> str: Performs a Google search for the given query.
  • fetch_property_value(address: str) -> str: Fetches property value estimates from Zillow and Redfin.
  • class BottomBubbleWindow(QWidget): Main window class for the GUI.
  • class ChatDialog(QWidget): Chat dialog window for displaying conversations.
  • class FileDropLineEdit(QLineEdit): Custom QLineEdit that supports file drag and drop.

Troubleshooting

Here are some common issues and their solutions:

  • API Key Issues: Ensure that you have set the correct API keys in the .voiceconfig file for the AI models you're using.
  • Speech Recognition Problems: Make sure your microphone is properly connected and selected as the default input device in your system settings.
  • GUI Not Responding: If the GUI becomes unresponsive, try restarting the application. If the issue persists, check the console for any error messages.
  • Web Scraping Errors: Ensure that you have the correct ChromeDriver version installed and that the path is correctly set in the configuration file.
  • File Attachment Issues: Verify that the file you're trying to attach is in a supported format (.txt, .csv, .xlsx, .xls) and is not corrupted.
  • Whisper or Kokoro Not Found: If you encounter errors related to Whisper or Kokoro not being found, ensure that these dependencies are properly installed and their paths are correctly set in your system's environment variables.

If you encounter any other issues, please check the console output for error messages and refer to the project's issue tracker or documentation for further assistance.