ControlUi Documentation
ControlUi is a powerful, cross-platform open-source application that integrates various AI models, speech recognition, and web scraping capabilities. It provides a seamless interface for voice and text interactions, file attachments, property lookups, and web searches.
Preliminary Information
Before diving into the setup and usage of the AI Assistant, it's important to understand its core components and dependencies.
Modes of Operation
The AI Assistant operates in two primary modes:
- Voice Mode: Allows users to interact with the AI using voice commands and receive spoken responses.
- Chat Mode: Provides a text-based interface for typing queries and receiving written responses.
Key Dependencies
The AI Assistant relies on two critical dependencies that must be installed and loaded into the global scope before the program can run:
- Whisper: An automatic speech recognition (ASR) system used for transcribing voice input.
- Kokoro: A text-to-speech engine used for generating spoken responses in Voice Mode.
Warning:
The Whisper and Kokoro models are loaded into the global scope. They must be installed and properly configured before running the AI Assistant. Failure to do so will result in runtime errors.
Setup
Installation
1. Install Anaconda/Conda
https://www.anaconda.com/download/success
Allows for easier environment management and python setup. Install system-wide and add to path. Recommended options.
2. Create new Conda environment
Run
conda -h
in your terminal to see if you have conda installed properlyOpen Command Prompt
Let's create new Conda environment, called cuda with Python version 3.11
conda create -n cuda python==3.11
This will create a new conda environment called cuda, and python and any libraries you need for this program will be under this environment.
To access the environment, open a comment prompt and type:
conda activate cuda
You'll now see your environment name at the left of your terminal path. Now we can install libraries required.
3. Install CUDA toolkit (for Kokoro & Whisper)
These are not required for chat-based functionality, but for Voice-mode, without a NVIDIA GPU your voice transcription and voice generation will be slower. With a 2080ti it's basically real-time whisper(base) transcription and 10-20x real-time Kokoro voice generation (very fast). With CPU-only you can expect some larger delays transcribing and slight delays generating, but by all means it will still work.
Install cudatoolkit v11.8.0 - https://anaconda.org/conda-forge/cudatoolkit
conda install -c conda-forge cudatoolkit
4. Install cuDNN
Not required for chat-based functionality
Install cudnn v8.9.7 - https://anaconda.org/conda-forge/cudnn
conda install -c conda-forge cudnn
5. Install Pytorch
Not required for chat-based functionality
Install Pytorch - https://pytorch.org/
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
6. Install Tensorflow
Not required for chat-based functionality
Install Tensorflow 2.14.0, as this is the last Tensorflow compatible version with CUDA 11.8. Reference: https://www.tensorflow.org/install/source#gpu
conda install -c conda-forge tensorflow=2.14.0=cuda118py311heb1bdc4_0
7. Other libraries
You should try running
python controlui.py
and see if you get import errorsIf you have any missing libraries, install them with pip:
pip install kokoro
orpip install pyperclip
orpip install keyboard
etcWith all the imported libraries installed, you should be able to run the program.
8. Start the Program
With your command prompt loaded with your proper conda environment that has everything installed, navigate to the proper directory where the .py file is. If you enter
dir
orls
you should be able to print the contents of your current working directory and see the file .py is present.Run
python controlui.py
Once you see Ready!... you can press Ctrl+k to bring up the ControlUi 😎
Configuration
Configure ControlUi by editing the .voiceconfig
file in the root directory. Here are the key settings:
{
"use_sonos": false,
"use_conversation_history": true,
"BROWSER_TYPE": "chrome",
"CHROME_USER_DATA": "C:\Users\PC\AppData\Local\Google\Chrome\User Data",
"CHROME_DRIVER_PATH": "C:\Users\PC\Downloads\chromedriver.exe",
"CHROME_PROFILE": "Profile 10",
"ENGINE": "OpenAI",
"MODEL_ENGINE": "gpt-4o",
"OPENAI_API_KEY": "your-api-key-here",
"GOOGLE_API_KEY": "your-google-api-key-here",
"days_back_to_load": 15,
"HOTKEY_LAUNCH": "ctrl+k"
}
Adjust these settings according to your preferences and API keys.
Core Components
Speech Recognition
The AI Assistant uses the Whisper model for speech recognition. Here's how it's implemented:
import whisper as openai_whisper
whisper_model = openai_whisper.load_model("base", device='cuda')
def record_and_transcribe_once() -> str:
# ... recording logic ...
def transcribe_audio(audio_data, samplerate):
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
temp_wav_name = tmp.name
sf.write(temp_wav_name, audio_data, samplerate)
result = whisper_model.transcribe(temp_wav_name, fp16=False)
return result["text"]
# ... more recording and transcription logic ...
AI Models Integration
The application supports multiple AI models, including OpenAI, Google, Ollama, Claude, Groq, and OpenRouter. Here's an example of how the OpenAI model is integrated:
def call_openai(prompt: str, model_name: str, reasoning_effort: str) -> str:
import openai
import json
global conversation_messages, OPENAI_API_KEY
ensure_system_prompt()
conversation_messages.append({"role": "user", "content": prompt})
openai.api_key = OPENAI_API_KEY
if not openai.api_key:
stop_spinner()
print(f"{RED}No OpenAI API key found.{RESET}")
return ""
# ... API call logic ...
try:
response = openai.chat.completions.create(**api_params)
except Exception as e:
print(f"{RED}Error connecting to OpenAI: {e}{RESET}")
return ""
# ... response handling ...
Web Scraping and External Tools
The AI Assistant includes web scraping capabilities for Google searches and property lookups. Here's an example of the Google search function:
def google_search(query: str) -> str:
global BROWSER_TYPE
stop_spinner()
print(f"{MAGENTA}Google search is: {query}{RESET}")
encoded_query = quote_plus(query)
url = f"https://www.google.com/search?q={encoded_query}"
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
if BROWSER_TYPE == 'chrome':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
# ... more browser setup ...
page = context.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
cleaned_text = ' '.join(text.split())[0:5000]
print(cleaned_text)
return cleaned_text
GUI Implementation
The graphical user interface is implemented using PySide6 (Qt for Python). Here's an example of the main window class:
class BottomBubbleWindow(QWidget):
global last_chat_geometry
response_ready = Signal(str, object, object)
def __init__(self):
global last_main_geometry, last_chat_geometry
super().__init__()
self.setWindowFlags(Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground, True)
self.setAttribute(Qt.WA_DeleteOnClose)
self.response_ready.connect(self.update_ai_reply)
# Initialize chat dialog with empty content
self.chat_dialog = ChatDialog(host_window=self)
if last_chat_geometry:
self.chat_dialog.setGeometry(last_chat_geometry)
self.chat_dialog.hide()
# ... more initialization ...
def on_message_sent(self, text):
# ... message handling logic ...
def process_ai_reply(self, text, container, lb, fresh):
try:
ai_reply = call_current_engine(text, fresh=fresh)
except Exception as e:
print(f"Error in AI thread: {e}")
ai_reply = f"[Error: {e}]"
self.response_ready.emit(ai_reply, container, lb)
# ... more methods ...
Features
Voice Interaction
The AI Assistant supports voice interactions using the Whisper model for speech recognition and a text-to-speech engine for responses. Here's how voice recording is implemented:
def record_and_transcribe_once() -> str:
global recording_flag, stop_chat_loop, whisper_model
model = whisper_model
if recording_flag:
return ""
recording_flag = True
audio_q.queue.clear()
samplerate = 24000
blocksize = 1024
silence_threshold = 70
max_silence_seconds = 0.9
MIN_RECORD_DURATION = 1.0
recorded_frames = []
speaking_detected = False
silence_start_time = None
with sd.InputStream(channels=1, samplerate=samplerate, blocksize=blocksize, callback=audio_callback):
print(f"{YELLOW}Recording started. Waiting for speech...{RESET}")
play_wav_file_blocking("recording_started.wav")
while True:
if stop_chat_loop:
break
# ... recording logic ...
if stop_chat_loop:
recording_flag = False
return ""
print(f"{GREEN}Recording ended. Transcribing...{RESET}")
# ... transcription logic ...
return text_result
Text Chat
Users can interact with the AI Assistant through text input. The chat interface is implemented in the GUI:
class ChatDialog(QWidget):
global conversation_messages
def __init__(self, host_window):
global conversation_messages
super().__init__()
self.host_window = host_window
self.setWindowFlags(Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground, True)
self.setAttribute(Qt.WA_DeleteOnClose)
# ... UI setup ...
self.reply_line = QLineEdit()
self.reply_line.setPlaceholderText("Type your reply...")
reply_layout.addWidget(self.reply_line, stretch=1)
self.reply_send_button = QToolButton()
self.reply_send_button.setText("↑")
self.reply_send_button.setToolTip("Send Reply")
reply_layout.addWidget(self.reply_send_button)
self.reply_send_button.clicked.connect(self.handle_reply_send)
self.reply_line.returnPressed.connect(self.handle_reply_send)
def handle_reply_send(self):
text = self.reply_line.text().strip()
if text:
self.add_message(text, role="user")
self.reply_line.clear()
container, lb = self.add_loading_bubble()
def do_ai_work():
try:
ai_reply = call_current_engine(text, fresh=False)
except Exception as e:
print("Error in AI thread:", e)
ai_reply = f"[Error: {e}]"
self.host_window.response_ready.emit(ai_reply, container, lb)
th = threading.Thread(target=do_ai_work, daemon=True)
th.start()
# ... more methods ...
File Attachments
The AI Assistant supports file attachments for text-based files. Here's how file handling is implemented:
class FileDropLineEdit(QLineEdit):
file_attached = Signal(list) # New signal to notify when a file is attached
def __init__(self, parent=None):
super().__init__(parent)
self.setAcceptDrops(True)
self.attachments = [] # Will hold dictionaries: {'filename': ..., 'content': ...}
def dragEnterEvent(self, event):
if event.mimeData().hasUrls():
for url in event.mimeData().urls():
file_path = url.toLocalFile()
if os.path.splitext(file_path)[1].lower() in ['.txt', '.csv', '.xlsx', '.xls']:
event.acceptProposedAction()
return
event.ignore()
else:
super().dragEnterEvent(event)
def dropEvent(self, event):
if event.mimeData().hasUrls():
attachments = []
for url in event.mimeData().urls():
file_path = url.toLocalFile()
ext = os.path.splitext(file_path)[1].lower()
if ext in ['.txt', '.csv', '.xlsx', '.xls']:
file_name = os.path.basename(file_path)
try:
content = read_file_content(file_path)
attachments.append({'filename': file_name, 'content': content})
except Exception as e:
attachments.append({'filename': file_name, 'content': f"Error reading file: {str(e)}"})
if attachments:
self.attachments = attachments
self.file_attached.emit(attachments)
event.acceptProposedAction()
else:
super().dropEvent(event)
Property Lookup
The AI Assistant can fetch property value estimates from Zillow and Redfin. Here's how it's implemented:
def fetch_property_value(address: str) -> str:
global driver
# Kill any lingering Chromium instances before starting a new search.
kill_chromium_instances()
try:
driver
except NameError:
# ... driver setup ...
stop_spinner()
print(f"{MAGENTA}Address for search: {address}{RESET}")
stop_spinner()
search_url = "https://www.google.com/search?q=" + address.replace(' ', '+')
try:
driver.get(search_url)
time.sleep(3.5)
except Exception as e:
stop_spinner()
print(f"{RED}[DEBUG] Exception during driver.get: {e}{RESET}")
stop_spinner()
return "Error performing Google search."
# ... search for Zillow and Redfin links ...
def open_in_new_tab(url):
# ... open URL in new tab and return page HTML ...
def parse_redfin_value(source):
# ... parse Redfin value from HTML ...
def parse_zillow_value(source):
# ... parse Zillow value from HTML ...
property_values = []
for domain, link in links_found.items():
if not link:
continue
page_html = open_in_new_tab(link)
extracted_value = None
if domain == 'Redfin':
extracted_value = parse_redfin_value(page_html)
elif domain == 'Zillow':
extracted_value = parse_zillow_value(page_html)
if extracted_value:
property_values.append((domain, extracted_value))
if not property_values:
return "Could not retrieve property values."
result_phrases = []
for domain, value in property_values:
result_phrases.append(f"{domain} estimates the home is worth {value}")
return ", and ".join(result_phrases)
Google Search Integration
The AI Assistant can perform Google searches to provide up-to-date information. Here's how it's implemented:
def google_search(query: str) -> str:
global BROWSER_TYPE
stop_spinner()
print(f"{MAGENTA}Google search is: {query}{RESET}")
encoded_query = quote_plus(query)
url = f"https://www.google.com/search?q={encoded_query}"
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, args=["--disable-blink-features=AutomationControlled"])
if BROWSER_TYPE == 'chrome':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
if BROWSER_TYPE == 'chromium':
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..."
)
page = context.new_page()
page.goto(url)
page.wait_for_load_state("networkidle")
html = page.content()
browser.close()
soup = BeautifulSoup(html, 'html.parser')
text = soup.get_text()
cleaned_text = ' '.join(text.split())[0:5000]
print(cleaned_text)
return cleaned_text
Advanced Usage
Custom AI Model Integration
To integrate a custom AI model, you need to add a new function to handle API calls and update the ENGINE_MODELS dictionary. Here's an example:
def call_custom_model(prompt: str, model_name: str) -> str:
# Implement your custom model API call here
# Example:
response = requests.post(
"https://api.custom-model.com/generate",
json={"prompt": prompt, "model": model_name}
)
return response.json()["generated_text"]
# Add to ENGINE_MODELS
ENGINE_MODELS["CustomAI"] = ["custom-model-1", "custom-model-2"]
# Update call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
global ENGINE, MODEL_ENGINE
if ENGINE == "CustomAI":
return call_custom_model(prompt, MODEL_ENGINE)
elif ENGINE == "Ollama":
return call_ollama(prompt, MODEL_ENGINE)
# ... existing code for other engines ...
Extending Functionality
To add new features or tools to the AI Assistant, you can create new functions and integrate them into the existing workflow. Here's an example of how you might add a weather lookup feature:
import requests
def weather_lookup(city: str) -> str:
api_key = "your_weather_api_key"
url = f"https://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
response = requests.get(url)
data = response.json()
if response.status_code == 200:
temp = data['main']['temp']
description = data['weather'][0]['description']
return f"The weather in {city} is {description} with a temperature of {temp}°C."
else:
return f"Unable to fetch weather data for {city}."
# Integrate into call_current_engine
def call_current_engine(prompt: str, fresh: bool = False) -> str:
global ENGINE, MODEL_ENGINE, conversation_messages
# Check if the prompt is asking for weather
if "weather in" in prompt.lower():
city = prompt.lower().split("weather in")[-1].strip()
weather_info = weather_lookup(city)
conversation_messages.append({"role": "assistant", "content": weather_info})
return weather_info
# Existing engine calls...
API Reference
Key Functions and Classes
Here are some of the key functions and classes in the AI Assistant:
record_and_transcribe_once()
: Records user speech and transcribes it using the Whisper model.call_current_engine(prompt: str, fresh: bool) -> str
: Calls the selected AI model with the given prompt.google_search(query: str) -> str
: Performs a Google search for the given query.fetch_property_value(address: str) -> str
: Fetches property value estimates from Zillow and Redfin.class BottomBubbleWindow(QWidget)
: Main window class for the GUI.class ChatDialog(QWidget)
: Chat dialog window for displaying conversations.class FileDropLineEdit(QLineEdit)
: Custom QLineEdit that supports file drag and drop.
Troubleshooting
Here are some common issues and their solutions:
- API Key Issues: Ensure that you have set the correct API keys in the
.voiceconfig
file for the AI models you're using. - Speech Recognition Problems: Make sure your microphone is properly connected and selected as the default input device in your system settings.
- GUI Not Responding: If the GUI becomes unresponsive, try restarting the application. If the issue persists, check the console for any error messages.
- Web Scraping Errors: Ensure that you have the correct ChromeDriver version installed and that the path is correctly set in the configuration file.
- File Attachment Issues: Verify that the file you're trying to attach is in a supported format (.txt, .csv, .xlsx, .xls) and is not corrupted.
- Whisper or Kokoro Not Found: If you encounter errors related to Whisper or Kokoro not being found, ensure that these dependencies are properly installed and their paths are correctly set in your system's environment variables.
If you encounter any other issues, please check the console output for error messages and refer to the project's issue tracker or documentation for further assistance.