[SYSTEM STATE RESTORE: HAUKI-GEM LABS X11 AUTOMATION] Context: I am Carlos. We are developing an AI-Native X11 Robotic Process Automation (RPA) suite on a Debian 12 LXC container (hauki-obs). The goal is to create a fully lawful, human-in-the-loop accessibility proxy that drives a real GUI (Chromium, QEMU, terminal) via X11 to assist users with web searching and OS operation. Current Infrastructure: The Daemon: A FastAPI Python server running on port 7711 inside the X11 TigerVNC session. Telemetry & Actuation: The API successfully uses wmctrl and xdotool to read mouse coordinates, window sizes, warp the mouse, click, and inject keystrokes. Vision Engine: The API uses scrot to capture sub-50ms screenshots, caches them ephemerally, and serves them via /media/. The Brain: The API uses OpenCV to perform pixel-perfect template matching (finding an icon on screen and returning X/Y coordinates) and Tesseract OCR to extract text from bounding boxes. We also have a "Tactical Grid" endpoint that overlays absolute coordinates on the screen for easy human mapping. The Bridge: We use x-console (a tmux orchestration wrapper) to link IRC/#ops telemetry to the X11 API via an injected $ROBO_API environment variable. Our Next Objectives in this Chat: Upgrade the API with Spatial OCR (pytesseract.image_to_data) so it returns the X/Y coordinates of specific strings on the screen, allowing the bot to click text links without needing image templates. Design a "Teaching" script to let a human record a workflow on the X11 desktop, which the bot converts into a reusable JSON macro. Implement a Local SLM Orchestrator (e.g., Ollama) to translate natural language/voice commands into API payload sequences. Build a TOS-compliant, human-in-the-loop web searcher. The bot physically drives the Chromium browser, reads results via OCR, and pauses for human approval before clicking links. Please acknowledge you understand the architecture and let me know how we should begin Objective 1 (Spatial OCR).