ClickUi Documentation

ClickUi is a powerful, cross-platform open-source application that integrates various AI models, speech recognition, and web scraping capabilities. It provides a seamless interface for voice and text interactions, file attachments, property lookups, and web searches.

Preliminary Information

Before diving into the setup and usage of the AI Assistant, it's important to understand its core components and dependencies.

Modes of Operation

The AI Assistant operates in two primary modes:

  • Voice Mode: Allows users to interact with the AI using voice commands and receive spoken responses.
  • Chat Mode: Provides a text-based interface for typing queries and receiving written responses.

Key Dependencies

The AI Assistant relies on two critical dependencies that must be installed and loaded into the global scope before the program can run:

  • Whisper: An automatic speech recognition (ASR) system used for transcribing voice input.
  • Kokoro: A text-to-speech engine used for generating spoken responses in Voice Mode.

Warning:

The Whisper and Kokoro models are loaded into the global scope. They must be installed and properly configured before running the AI Assistant. Failure to do so will result in runtime errors.
You can run without Voice functionality & dependencies by commenting out the Whisper & Kokoro loading code, but voice mode will not work.

Setup / Run

Once installed, simply navigate to the folder containing clickui.py in your command prompt or terminal and run:

python clickui.py

After you see Ready!... in your console, press Ctrl + k to bring up the ClickUi interface.

Easy Installation (Windows Only)

  1. Install Anaconda/Conda
    Download and install Anaconda/Conda from https://www.anaconda.com/download/success. This allows for easier environment management and Python setup. Install system-wide (for all users) and add to the PATH.
  2. Run Install.bat
    Available from ClickUi.app or in the GitHub repo. This file will:
    • Download this Git repository
    • Run the installation commands for you
    • Start the program automatically

    You can use the Install.bat file to launch the program, or simply run python clickui.py in your command prompt once everything is installed.

Manual Installation

  1. Install Anaconda/Conda
    Download and install Anaconda/Conda from https://www.anaconda.com/download/success. You'll need this to install the dependencies in conda_packages.txt.
  2. Keep the files together in one folder
    Ensure the Python files, images, and other assets from this repo (like sonos.py,.svg icons, etc.) remain together in a single folder on your machine.
  3. Create a new Conda environment & install packages
    • Download/clone the repository, then open a command prompt/terminal and navigate (cd) to the directory containing this code. Your prompt might look like:
      C:\Users\PC\Downloads\ClickUi>
    • Run conda -h to ensure conda is installed properly.
    • Create the conda environment and install required libraries:
      conda create --name click_ui --file conda_packages.txt

      This creates a new Conda environment named click_ui and installs the packages listed in conda_packages.txt.

    • Activate the environment:
      conda activate click_ui
    • Now install the rest of the pip modules:
      pip install -r requirements.txt
  4. Start the Program
    With your command prompt in the folder containing clickui.py:
    python clickui.py

    Once you see Ready!..., press Ctrl + k to bring up the ClickUi interface.



Configuration

Configure ClickUi by editing the .voiceconfig file in the root directory. You can also edit these settings through the Settings menu in the GUI and then click the "Save Config" button.

Here are the key settings:

{
  "use_sonos": false,
  "use_conversation_history": true,
  "BROWSER_TYPE": "chrome",
  "CHROME_USER_DATA": "C:\Users\PC\AppData\Local\Google\Chrome\User Data",
  "CHROME_DRIVER_PATH": "C:\Users\PC\Downloads\chromedriver.exe",
  "CHROME_PROFILE": "Profile 10",
  "ENGINE": "OpenAI",
  "MODEL_ENGINE": "gpt-4o",
  "OPENAI_API_KEY": "your-api-key-here",
  "GOOGLE_API_KEY": "your-google-api-key-here",
  "days_back_to_load": 15,
  "HOTKEY_LAUNCH": "ctrl+k"
}

Adjust these settings according to your preferences and API keys.

Conversation History

When enabled ("use_conversation_history": true), the app loads your previous chats from the lastdays_back_to_load days located in your local history folder. It injects them into the current chat session so you can pick up where you left off.

Note: Conversation history can quickly consume a lot of tokens! The CLI will print the total tokens used for your loaded conversation when enabled.

Core Components

Speech Recognition

The AI Assistant uses the Whisper model for speech recognition. Here's how it's implemented:

import whisper as openai_whisper

whisper_model = openai_whisper.load_model("base", device='cuda')

def record_and_transcribe_once() -> str:
    # ... recording logic ...
    
    def transcribe_audio(audio_data, samplerate):
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
            temp_wav_name = tmp.name
        sf.write(temp_wav_name, audio_data, samplerate)
        result = whisper_model.transcribe(temp_wav_name, fp16=False)
        return result["text"]
    
    # ... more recording and transcription logic ...

AI Models Integration

The application supports multiple AI models, including OpenAI, Google, Ollama, Claude, Groq, and OpenRouter. Here's an example of how the OpenAI model is integrated:

def call_openai(prompt: str, model_name: str, reasoning_effort: str) -> str:
    import openai
    import json
    global conversation_messages, OPENAI_API_KEY
    ensure_system_prompt()
    conversation_messages.append({"role": "user", "content": prompt})

    openai.api_key = OPENAI_API_KEY 

    if not openai.api_key:
        stop_spinner()
        print(f"{RED}No OpenAI API key found.{RESET}")
        return ""

    # ... API call logic ...

    try:
        response = openai.chat.completions.create(**api_params)
    except Exception as e:
        print(f"{RED}Error connecting to OpenAI: {e}{RESET}")
        return ""
    
    # ... response handling ...

Web Scraping and External Tools

The AI Assistant includes web scraping capabilities for Google searches and property lookups. Here's an example of the Google search function:

def google_search(query: str) -> str:
    global BROWSER_TYPE
    stop_spinner()
    print(f"{MAGENTA}Google search is: {query}{RESET}")
    encoded_query = quote_plus(query)
    url = f"https://www.google.com/search?q={encoded_query}"
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
        if BROWSER_TYPE == 'chrome':
            context = browser.new_context(
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
            )
        # ... more browser setup ...
        page = context.new_page()
        page.goto(url)
        page.wait_for_load_state("networkidle")
        html = page.content()
        browser.close()
    soup = BeautifulSoup(html, 'html.parser')
    text = soup.get_text()
    cleaned_text = ' '.join(text.split())[0:5000]
    print(cleaned_text)
    return cleaned_text

GUI Implementation

The graphical user interface is implemented using PySide6 (Qt for Python). Here's an example of the main window class:

class BottomBubbleWindow(QWidget):
    global last_chat_geometry
    response_ready = Signal(str, object, object)

    def __init__(self):
        global last_main_geometry, last_chat_geometry        
        super().__init__()
        self.setWindowFlags(Qt.FramelessWindowHint)
        self.setAttribute(Qt.WA_TranslucentBackground, True)
        self.setAttribute(Qt.WA_DeleteOnClose)
        self.response_ready.connect(self.update_ai_reply)

        # Initialize chat dialog with empty content
        self.chat_dialog = ChatDialog(host_window=self)
        if last_chat_geometry:
            self.chat_dialog.setGeometry(last_chat_geometry)
        self.chat_dialog.hide()

        # ... more initialization ...

    def on_message_sent(self, text):
        # ... message handling logic ...

    def process_ai_reply(self, text, container, lb, fresh):
        try:
            ai_reply = call_current_engine(text, fresh=fresh)
        except Exception as e:
            print(f"Error in AI thread: {e}")
            ai_reply = f"[Error: {e}]"
        self.response_ready.emit(ai_reply, container, lb)

    # ... more methods ...

Features

Voice Interaction

The AI Assistant supports voice interactions using the Whisper model for speech recognition and a text-to-speech engine for responses. Here's how voice recording is implemented:

def record_and_transcribe_once() -> str:
    global recording_flag, stop_chat_loop, whisper_model
    model = whisper_model
    if recording_flag:
        return ""
    recording_flag = True
    audio_q.queue.clear()
    samplerate = 24000
    blocksize = 1024
    silence_threshold = 70
    max_silence_seconds = 0.9
    MIN_RECORD_DURATION = 1.0
    recorded_frames = []
    speaking_detected = False
    silence_start_time = None

    with sd.InputStream(channels=1, samplerate=samplerate, blocksize=blocksize, callback=audio_callback):
        print(f"{YELLOW}Recording started. Waiting for speech...{RESET}")
        play_wav_file_blocking("recording_started.wav")
        while True:
            if stop_chat_loop:
                break
            # ... recording logic ...

    if stop_chat_loop:
        recording_flag = False
        return ""
    print(f"{GREEN}Recording ended. Transcribing...{RESET}")
    # ... transcription logic ...
    return text_result

Text Chat

Users can interact with the AI Assistant through text input. The chat interface is implemented in the GUI:

class ChatDialog(QWidget):
    global conversation_messages
    def __init__(self, host_window):
        global conversation_messages
        super().__init__()
        self.host_window = host_window
        self.setWindowFlags(Qt.FramelessWindowHint)
        self.setAttribute(Qt.WA_TranslucentBackground, True)
        self.setAttribute(Qt.WA_DeleteOnClose)

        # ... UI setup ...

        self.reply_line = QLineEdit()
        self.reply_line.setPlaceholderText("Type your reply...")
        reply_layout.addWidget(self.reply_line, stretch=1)
        self.reply_send_button = QToolButton()
        self.reply_send_button.setText("↑")
        self.reply_send_button.setToolTip("Send Reply")
        reply_layout.addWidget(self.reply_send_button)
        self.reply_send_button.clicked.connect(self.handle_reply_send)
        self.reply_line.returnPressed.connect(self.handle_reply_send)

    def handle_reply_send(self):
        text = self.reply_line.text().strip()
        if text:
            self.add_message(text, role="user")
            self.reply_line.clear()
            container, lb = self.add_loading_bubble()
            def do_ai_work():
                try:
                    ai_reply = call_current_engine(text, fresh=False)
                except Exception as e:
                    print("Error in AI thread:", e)
                    ai_reply = f"[Error: {e}]"
                self.host_window.response_ready.emit(ai_reply, container, lb)
            th = threading.Thread(target=do_ai_work, daemon=True)
            th.start()

    # ... more methods ...

File Attachments

The AI Assistant supports file attachments for text-based files. Here's how file handling is implemented:

class FileDropLineEdit(QLineEdit):
    file_attached = Signal(list)  # New signal to notify when a file is attached

    def __init__(self, parent=None):
        super().__init__(parent)
        self.setAcceptDrops(True)
        self.attachments = []  # Will hold dictionaries: {'filename': ..., 'content': ...}

    def dragEnterEvent(self, event):
        if event.mimeData().hasUrls():
            for url in event.mimeData().urls():
                file_path = url.toLocalFile()
                if os.path.splitext(file_path)[1].lower() in ['.txt', '.csv', '.xlsx', '.xls']:
                    event.acceptProposedAction()
                    return
            event.ignore()
        else:
            super().dragEnterEvent(event)

    def dropEvent(self, event):
        if event.mimeData().hasUrls():
            attachments = []
            for url in event.mimeData().urls():
                file_path = url.toLocalFile()
                ext = os.path.splitext(file_path)[1].lower()
                if ext in ['.txt', '.csv', '.xlsx', '.xls']:
                    file_name = os.path.basename(file_path)
                    try:
                        content = read_file_content(file_path)
                        attachments.append({'filename': file_name, 'content': content})
                    except Exception as e:
                        attachments.append({'filename': file_name, 'content': f"Error reading file: {str(e)}"})
            if attachments:
                self.attachments = attachments
                self.file_attached.emit(attachments)
            event.acceptProposedAction()
        else:
            super().dropEvent(event)

Property Lookup

The AI Assistant can fetch property value estimates from Zillow and Redfin. Here's how it's implemented:

def fetch_property_value(address: str) -> str:
    global driver
    # Kill any lingering Chromium instances before starting a new search.
    kill_chromium_instances()
    try:
        driver
    except NameError:
        # ... driver setup ...

    stop_spinner()
    print(f"{MAGENTA}Address for search: {address}{RESET}")
    stop_spinner()

    search_url = "https://www.google.com/search?q=" + address.replace(' ', '+')
    try:
        driver.get(search_url)
        time.sleep(3.5)
    except Exception as e:
        stop_spinner()
        print(f"{RED}[DEBUG] Exception during driver.get: {e}{RESET}")
        stop_spinner()
        return "Error performing Google search."

    # ... search for Zillow and Redfin links ...

    def open_in_new_tab(url):
        # ... open URL in new tab and return page HTML ...

    def parse_redfin_value(source):
        # ... parse Redfin value from HTML ...

    def parse_zillow_value(source):
        # ... parse Zillow value from HTML ...

    property_values = []
    for domain, link in links_found.items():
        if not link:
            continue
        page_html = open_in_new_tab(link)
        extracted_value = None
        if domain == 'Redfin':
            extracted_value = parse_redfin_value(page_html)
        elif domain == 'Zillow':
            extracted_value = parse_zillow_value(page_html)
        if extracted_value:
            property_values.append((domain, extracted_value))

    if not property_values:
        return "Could not retrieve property values."

    result_phrases = []
    for domain, value in property_values:
        result_phrases.append(f"{domain} estimates the home is worth {value}")
    return ", and ".join(result_phrases)

Google Search Integration

The AI Assistant can perform Google searches to provide up-to-date information. Here's how it's implemented:

def google_search(query: str) -> str:
    global BROWSER_TYPE
    stop_spinner()
    print(f"{MAGENTA}Google search is: {query}{RESET}")
    encoded_query = quote_plus(query)
    url = f"https://www.google.com/search?q={encoded_query}"
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
        if BROWSER_TYPE == 'chrome':
            context = browser.new_context(
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
            )
        if BROWSER_TYPE == 'chromium':
            context = browser.new_context(
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
            )
        page = context.new_page()
        page.goto(url)
        page.wait_for_load_state("networkidle")
        html = page.content()
        browser.close()
    soup = BeautifulSoup(html, 'html.parser')
    text = soup.get_text()
    cleaned_text = ' '.join(text.split())[0:5000]
    print(cleaned_text)
    return cleaned_text

Advanced Usage

Custom AI Model Integration

To integrate a custom AI model, you need to add a new function to handle API calls and update theENGINE_MODELS dictionary. Here's an example:

def call_custom_model(prompt: str, model_name: str) -> str:
    # Implement your custom model API call here
    # Example:
    response = requests.post(
        "https://api.custom-model.com/generate",
        json={"prompt": prompt, "model": model_name}
    )
    return response.json()["generated_text"]

# Add to ENGINE_MODELS
ENGINE_MODELS["CustomAI"] = ["custom-model-1", "custom-model-2"]

# Update call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
    global ENGINE, MODEL_ENGINE
    if ENGINE == "CustomAI":
        return call_custom_model(prompt, MODEL_ENGINE)
    elif ENGINE == "Ollama":
        return call_ollama(prompt, MODEL_ENGINE)
    # ... existing code for other engines ...

Extending Functionality

To add new features or tools to the AI Assistant, you can create new functions and integrate them into the existing workflow. Here's an example of how you might add a weather lookup feature:

import requests

def weather_lookup(city: str) -> str:
    api_key = "your_weather_api_key"
    url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
    response = requests.get(url)
    data = response.json()
    if response.status_code == 200:
        temp = data['main']['temp']
        description = data['weather'][0]['description']
        return f"The weather in {city} is {description} with a temperature of {temp}°C."
    else:
        return f"Unable to fetch weather data for {city}."

# Integrate into call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
    global ENGINE, MODEL_ENGINE, conversation_messages
    
    # Check if the prompt is asking for weather
    if "weather in" in prompt.lower():
        city = prompt.lower().split("weather in")[-1].strip()
        weather_info = weather_lookup(city)
        conversation_messages.append({"role": "assistant", "content": weather_info})
        return weather_info
    
    # Existing engine calls...

API Reference

Key Functions and Classes

Here are some of the key functions and classes in the AI Assistant:

  • record_and_transcribe_once(): Records user speech and transcribes it using the Whisper model.
  • call_current_engine(prompt: str, fresh: bool) -> str: Calls the selected AI model with the given prompt.
  • google_search(query: str) -> str: Performs a Google search for the given query.
  • fetch_property_value(address: str) -> str: Fetches property value estimates from Zillow and Redfin.
  • class BottomBubbleWindow(QWidget): Main window class for the GUI.
  • class ChatDialog(QWidget): Chat dialog window for displaying conversations.
  • class FileDropLineEdit(QLineEdit): Custom QLineEdit that supports file drag and drop.

Troubleshooting

Here are some common issues and their solutions:

  • API Key Issues: Ensure that you have set the correct API keys in the .voiceconfig file for the AI models you're using.
  • Speech Recognition Problems: Make sure your microphone is properly connected and selected as the default input device in your system settings.
  • GUI Not Responding: If the GUI becomes unresponsive, try restarting the application. If the issue persists, check the console for any error messages.
  • Web Scraping Errors: Ensure that you have the correct ChromeDriver version installed and that the path is correctly set in the configuration file.
  • File Attachment Issues: Verify that the file you're trying to attach is in a supported format (.txt, .csv, .xlsx, .xls) and is not corrupted.
  • Whisper or Kokoro Not Found: If you encounter errors related to Whisper or Kokoro not being found, ensure that these dependencies are properly installed and their paths are correctly set in your system's environment variables.
  • Browser/Web-Search Related Issues: Double-check that your ChromeDriver version matches your local Chrome or Chromium version. Make sure the user data path and profile name in .voiceconfig are correct, and that no Chrome/Chromium instance is running in your Task Manager before starting a fresh test.

If you encounter any other issues, please check the console output for error messages and refer to the project's issue tracker or documentation for further assistance.