ClickUi Documentation
ClickUi is a powerful, cross-platform open-source application that integrates various AI models, speech recognition, and web scraping capabilities. It provides a seamless interface for voice and text interactions, file attachments, property lookups, and web searches.
Preliminary Information
Before diving into the setup and usage of the AI Assistant, it's important to understand its core components and dependencies.
Modes of Operation
The AI Assistant operates in two primary modes:
- Voice Mode: Allows users to interact with the AI using voice commands and receive spoken responses.
- Chat Mode: Provides a text-based interface for typing queries and receiving written responses.
Key Dependencies
The AI Assistant relies on two critical dependencies that must be installed and loaded into the global scope before the program can run:
- Whisper: An automatic speech recognition (ASR) system used for transcribing voice input.
- Kokoro: A text-to-speech engine used for generating spoken responses in Voice Mode.
Warning:
The Whisper and Kokoro models are loaded into the global scope. They must be installed and properly configured before running the AI Assistant. Failure to do so will result in runtime errors.
You can run without Voice functionality & dependencies by commenting out the Whisper & Kokoro loading code, but voice mode will not work.
Setup / Run
Once installed, simply navigate to the folder containing clickui.py
in your command prompt or terminal and run:
python clickui.py
After you see Ready!...
in your console, press Ctrl + k to bring up the ClickUi interface.
Easy Installation (Windows Only)
- Install Anaconda/Conda
Download and install Anaconda/Conda from https://www.anaconda.com/download/success. This allows for easier environment management and Python setup. Install system-wide (for all users) and add to the PATH. - Run Install.bat
Available from ClickUi.app or in the GitHub repo. This file will:- Download this Git repository
- Run the installation commands for you
- Start the program automatically
You can use theInstall.bat
file to launch the program, or simply runpython clickui.py
in your command prompt once everything is installed.
Manual Installation
- Install Anaconda/Conda
Download and install Anaconda/Conda from https://www.anaconda.com/download/success. You'll need this to install the dependencies inconda_packages.txt
. - Keep the files together in one folder
Ensure the Python files, images, and other assets from this repo (likesonos.py
,.svg
icons, etc.) remain together in a single folder on your machine. - Create a new Conda environment & install packages
- Download/clone the repository, then open a command prompt/terminal and navigate (
cd
) to the directory containing this code. Your prompt might look like:C:\Users\PC\Downloads\ClickUi>
- Run
conda -h
to ensure conda is installed properly. - Create the conda environment and install required libraries:
conda create --name click_ui --file conda_packages.txt
This creates a new Conda environment named
click_ui
and installs the packages listed inconda_packages.txt
. - Activate the environment:
conda activate click_ui
- Now install the rest of the pip modules:
pip install -r requirements.txt
- Download/clone the repository, then open a command prompt/terminal and navigate (
- Start the Program
With your command prompt in the folder containingclickui.py
:python clickui.py
Once you see
Ready!...
, press Ctrl + k to bring up the ClickUi interface.
Configuration
Configure ClickUi by editing the .voiceconfig
file in the root directory. You can also edit these settings through the Settings menu in the GUI and then click the "Save Config" button.
Here are the key settings:
{
"use_sonos": false,
"use_conversation_history": true,
"BROWSER_TYPE": "chrome",
"CHROME_USER_DATA": "C:\Users\PC\AppData\Local\Google\Chrome\User Data",
"CHROME_DRIVER_PATH": "C:\Users\PC\Downloads\chromedriver.exe",
"CHROME_PROFILE": "Profile 10",
"ENGINE": "OpenAI",
"MODEL_ENGINE": "gpt-4o",
"OPENAI_API_KEY": "your-api-key-here",
"GOOGLE_API_KEY": "your-google-api-key-here",
"days_back_to_load": 15,
"HOTKEY_LAUNCH": "ctrl+k"
}
Adjust these settings according to your preferences and API keys.
Conversation History
When enabled ("use_conversation_history": true
), the app loads your previous chats from the lastdays_back_to_load
days located in your local history
folder. It injects them into the current chat session so you can pick up where you left off.
Note: Conversation history can quickly consume a lot of tokens! The CLI will print the total tokens used for your loaded conversation when enabled.
Core Components
Speech Recognition
The AI Assistant uses the Whisper model for speech recognition. Here's how it's implemented:
import whisper as openai_whisper
whisper_model = openai_whisper.load_model("base", device='cuda')
def record_and_transcribe_once() -> str:
# ... recording logic ...
def transcribe_audio(audio_data, samplerate):
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
temp_wav_name = tmp.name
sf.write(temp_wav_name, audio_data, samplerate)
result = whisper_model.transcribe(temp_wav_name, fp16=False)
return result["text"]
# ... more recording and transcription logic ...
AI Models Integration
The application supports multiple AI models, including OpenAI, Google, Ollama, Claude, Groq, and OpenRouter. Here's an example of how the OpenAI model is integrated:
def call_openai(prompt: str, model_name: str, reasoning_effort: str) -> str:
import openai
import json
global conversation_messages, OPENAI_API_KEY
ensure_system_prompt()
conversation_messages.append({"role": "user", "content": prompt})
openai.api_key = OPENAI_API_KEY
if not openai.api_key:
stop_spinner()
print(f"{RED}No OpenAI API key found.{RESET}")
return ""
# ... API call logic ...
try:
response = openai.chat.completions.create(**api_params)
except Exception as e:
print(f"{RED}Error connecting to OpenAI: {e}{RESET}")
return ""
# ... response handling ...
Web Scraping and External Tools
The AI Assistant includes web scraping capabilities for Google searches and property lookups. Here's an example of the Google search function:
def google_search(query: str) -> str:
global BROWSER_TYPE
stop_spinner()
print(f"{MAGENTA}Google search is: {query}{RESET}")
encoded_query = quote_plus(query)
url = f"https://www.google.com/search?q={encoded_query}"
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
if BROWSER_TYPE == 'chrome':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
# ... more browser setup ...
page = context.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
cleaned_text = ' '.join(text.split())[0:5000]
print(cleaned_text)
return cleaned_text
GUI Implementation
The graphical user interface is implemented using PySide6 (Qt for Python). Here's an example of the main window class:
class BottomBubbleWindow(QWidget):
global last_chat_geometry
response_ready = Signal(str, object, object)
def __init__(self):
global last_main_geometry, last_chat_geometry
super().__init__()
self.setWindowFlags(Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground, True)
self.setAttribute(Qt.WA_DeleteOnClose)
self.response_ready.connect(self.update_ai_reply)
# Initialize chat dialog with empty content
self.chat_dialog = ChatDialog(host_window=self)
if last_chat_geometry:
self.chat_dialog.setGeometry(last_chat_geometry)
self.chat_dialog.hide()
# ... more initialization ...
def on_message_sent(self, text):
# ... message handling logic ...
def process_ai_reply(self, text, container, lb, fresh):
try:
ai_reply = call_current_engine(text, fresh=fresh)
except Exception as e:
print(f"Error in AI thread: {e}")
ai_reply = f"[Error: {e}]"
self.response_ready.emit(ai_reply, container, lb)
# ... more methods ...
Features
Voice Interaction
The AI Assistant supports voice interactions using the Whisper model for speech recognition and a text-to-speech engine for responses. Here's how voice recording is implemented:
def record_and_transcribe_once() -> str:
global recording_flag, stop_chat_loop, whisper_model
model = whisper_model
if recording_flag:
return ""
recording_flag = True
audio_q.queue.clear()
samplerate = 24000
blocksize = 1024
silence_threshold = 70
max_silence_seconds = 0.9
MIN_RECORD_DURATION = 1.0
recorded_frames = []
speaking_detected = False
silence_start_time = None
with sd.InputStream(channels=1, samplerate=samplerate, blocksize=blocksize, callback=audio_callback):
print(f"{YELLOW}Recording started. Waiting for speech...{RESET}")
play_wav_file_blocking("recording_started.wav")
while True:
if stop_chat_loop:
break
# ... recording logic ...
if stop_chat_loop:
recording_flag = False
return ""
print(f"{GREEN}Recording ended. Transcribing...{RESET}")
# ... transcription logic ...
return text_result
Text Chat
Users can interact with the AI Assistant through text input. The chat interface is implemented in the GUI:
class ChatDialog(QWidget):
global conversation_messages
def __init__(self, host_window):
global conversation_messages
super().__init__()
self.host_window = host_window
self.setWindowFlags(Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground, True)
self.setAttribute(Qt.WA_DeleteOnClose)
# ... UI setup ...
self.reply_line = QLineEdit()
self.reply_line.setPlaceholderText("Type your reply...")
reply_layout.addWidget(self.reply_line, stretch=1)
self.reply_send_button = QToolButton()
self.reply_send_button.setText("↑")
self.reply_send_button.setToolTip("Send Reply")
reply_layout.addWidget(self.reply_send_button)
self.reply_send_button.clicked.connect(self.handle_reply_send)
self.reply_line.returnPressed.connect(self.handle_reply_send)
def handle_reply_send(self):
text = self.reply_line.text().strip()
if text:
self.add_message(text, role="user")
self.reply_line.clear()
container, lb = self.add_loading_bubble()
def do_ai_work():
try:
ai_reply = call_current_engine(text, fresh=False)
except Exception as e:
print("Error in AI thread:", e)
ai_reply = f"[Error: {e}]"
self.host_window.response_ready.emit(ai_reply, container, lb)
th = threading.Thread(target=do_ai_work, daemon=True)
th.start()
# ... more methods ...
File Attachments
The AI Assistant supports file attachments for text-based files. Here's how file handling is implemented:
class FileDropLineEdit(QLineEdit):
file_attached = Signal(list) # New signal to notify when a file is attached
def __init__(self, parent=None):
super().__init__(parent)
self.setAcceptDrops(True)
self.attachments = [] # Will hold dictionaries: {'filename': ..., 'content': ...}
def dragEnterEvent(self, event):
if event.mimeData().hasUrls():
for url in event.mimeData().urls():
file_path = url.toLocalFile()
if os.path.splitext(file_path)[1].lower() in ['.txt', '.csv', '.xlsx', '.xls']:
event.acceptProposedAction()
return
event.ignore()
else:
super().dragEnterEvent(event)
def dropEvent(self, event):
if event.mimeData().hasUrls():
attachments = []
for url in event.mimeData().urls():
file_path = url.toLocalFile()
ext = os.path.splitext(file_path)[1].lower()
if ext in ['.txt', '.csv', '.xlsx', '.xls']:
file_name = os.path.basename(file_path)
try:
content = read_file_content(file_path)
attachments.append({'filename': file_name, 'content': content})
except Exception as e:
attachments.append({'filename': file_name, 'content': f"Error reading file: {str(e)}"})
if attachments:
self.attachments = attachments
self.file_attached.emit(attachments)
event.acceptProposedAction()
else:
super().dropEvent(event)
Property Lookup
The AI Assistant can fetch property value estimates from Zillow and Redfin. Here's how it's implemented:
def fetch_property_value(address: str) -> str:
global driver
# Kill any lingering Chromium instances before starting a new search.
kill_chromium_instances()
try:
driver
except NameError:
# ... driver setup ...
stop_spinner()
print(f"{MAGENTA}Address for search: {address}{RESET}")
stop_spinner()
search_url = "https://www.google.com/search?q=" + address.replace(' ', '+')
try:
driver.get(search_url)
time.sleep(3.5)
except Exception as e:
stop_spinner()
print(f"{RED}[DEBUG] Exception during driver.get: {e}{RESET}")
stop_spinner()
return "Error performing Google search."
# ... search for Zillow and Redfin links ...
def open_in_new_tab(url):
# ... open URL in new tab and return page HTML ...
def parse_redfin_value(source):
# ... parse Redfin value from HTML ...
def parse_zillow_value(source):
# ... parse Zillow value from HTML ...
property_values = []
for domain, link in links_found.items():
if not link:
continue
page_html = open_in_new_tab(link)
extracted_value = None
if domain == 'Redfin':
extracted_value = parse_redfin_value(page_html)
elif domain == 'Zillow':
extracted_value = parse_zillow_value(page_html)
if extracted_value:
property_values.append((domain, extracted_value))
if not property_values:
return "Could not retrieve property values."
result_phrases = []
for domain, value in property_values:
result_phrases.append(f"{domain} estimates the home is worth {value}")
return ", and ".join(result_phrases)
Google Search Integration
The AI Assistant can perform Google searches to provide up-to-date information. Here's how it's implemented:
def google_search(query: str) -> str:
global BROWSER_TYPE
stop_spinner()
print(f"{MAGENTA}Google search is: {query}{RESET}")
encoded_query = quote_plus(query)
url = f"https://www.google.com/search?q={encoded_query}"
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
if BROWSER_TYPE == 'chrome':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
if BROWSER_TYPE == 'chromium':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
page = context.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
cleaned_text = ' '.join(text.split())[0:5000]
print(cleaned_text)
return cleaned_text
Advanced Usage
Custom AI Model Integration
To integrate a custom AI model, you need to add a new function to handle API calls and update theENGINE_MODELS
dictionary. Here's an example:
def call_custom_model(prompt: str, model_name: str) -> str:
# Implement your custom model API call here
# Example:
response = requests.post(
"https://api.custom-model.com/generate",
json={"prompt": prompt, "model": model_name}
)
return response.json()["generated_text"]
# Add to ENGINE_MODELS
ENGINE_MODELS["CustomAI"] = ["custom-model-1", "custom-model-2"]
# Update call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
global ENGINE, MODEL_ENGINE
if ENGINE == "CustomAI":
return call_custom_model(prompt, MODEL_ENGINE)
elif ENGINE == "Ollama":
return call_ollama(prompt, MODEL_ENGINE)
# ... existing code for other engines ...
Extending Functionality
To add new features or tools to the AI Assistant, you can create new functions and integrate them into the existing workflow. Here's an example of how you might add a weather lookup feature:
import requests
def weather_lookup(city: str) -> str:
api_key = "your_weather_api_key"
url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
response = requests.get(url)
data = response.json()
if response.status_code == 200:
temp = data['main']['temp']
description = data['weather'][0]['description']
return f"The weather in {city} is {description} with a temperature of {temp}°C."
else:
return f"Unable to fetch weather data for {city}."
# Integrate into call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
global ENGINE, MODEL_ENGINE, conversation_messages
# Check if the prompt is asking for weather
if "weather in" in prompt.lower():
city = prompt.lower().split("weather in")[-1].strip()
weather_info = weather_lookup(city)
conversation_messages.append({"role": "assistant", "content": weather_info})
return weather_info
# Existing engine calls...
API Reference
Key Functions and Classes
Here are some of the key functions and classes in the AI Assistant:
record_and_transcribe_once()
: Records user speech and transcribes it using the Whisper model.call_current_engine(prompt: str, fresh: bool) -> str
: Calls the selected AI model with the given prompt.google_search(query: str) -> str
: Performs a Google search for the given query.fetch_property_value(address: str) -> str
: Fetches property value estimates from Zillow and Redfin.class BottomBubbleWindow(QWidget)
: Main window class for the GUI.class ChatDialog(QWidget)
: Chat dialog window for displaying conversations.class FileDropLineEdit(QLineEdit)
: Custom QLineEdit that supports file drag and drop.
Troubleshooting
Here are some common issues and their solutions:
- API Key Issues: Ensure that you have set the correct API keys in the
.voiceconfig
file for the AI models you're using. - Speech Recognition Problems: Make sure your microphone is properly connected and selected as the default input device in your system settings.
- GUI Not Responding: If the GUI becomes unresponsive, try restarting the application. If the issue persists, check the console for any error messages.
- Web Scraping Errors: Ensure that you have the correct ChromeDriver version installed and that the path is correctly set in the configuration file.
- File Attachment Issues: Verify that the file you're trying to attach is in a supported format (.txt, .csv, .xlsx, .xls) and is not corrupted.
- Whisper or Kokoro Not Found: If you encounter errors related to Whisper or Kokoro not being found, ensure that these dependencies are properly installed and their paths are correctly set in your system's environment variables.
- Browser/Web-Search Related Issues: Double-check that your ChromeDriver version matches your local Chrome or Chromium version. Make sure the user data path and profile name in
.voiceconfig
are correct, and that no Chrome/Chromium instance is running in your Task Manager before starting a fresh test.
If you encounter any other issues, please check the console output for error messages and refer to the project's issue tracker or documentation for further assistance.