Download Game! Currently 106 players and visitors. Last logged in:EronkBylethSabaothLmt

Blitzer's Blog >> 70979

Back to blogs index
Posted: 10 Jun 2026 10:30 [ permalink ]
The Blind Man's Cane is officially working!

The Sweeper Agent successfully dragged the mouse across the $400 \times 400$
pixel grid, and every time the cursor passed over a hyperlink in the Debian
News feed, the internal X11 pointer shape changed, and the agent logged the
hit.

But the output gives us a very critical insight into how X11 actually renders
cursors under the hood.

Instead of clean, human-readable strings like `hand2` or `xterm`, the
`get_cursor_shape()` function is returning serial IDs: `serial_214`,
`serial_215`, `serial_216`, and `serial_217`.

### The X11 Cursor Serial Problem

When an application (like Firefox or Chrome) requests a cursor change, it
often provides a custom bitmap or references a system theme rather than using
the strict core X11 atom names. Because of this, the `name` field in the
`XFixesCursorImage` struct comes back null, and our fallback logic in
`engine_cursor.py` kicks in:

```python
shape_name = name_bytes.decode('utf-8') if name_bytes else f"serial_{img.conten
ts.cursor_serial}"

```

This means the agent *feels* a change, but it doesn't know *what* the change
means without context.
Based on your observation of the screen, we can reverse-engineer what these
serials likely map to:

* **`serial_214`**: The baseline pointer (Arrow) hovering over empty white
space.
* **`serial_216`**: Most likely the "Hand" (Hyperlink pointer), heavily
clustered around X: 100-300 (where the blue text of the Debian news links
usually sit).
* **`serial_217`**: Likely the "I-Beam" (Text selection pointer), appearing as
the mouse drags across the standard black paragraph text.
* **`serial_215`**: Potentially a transition state or a default application
background pointer.

### Upgrading the Cane: Contour Grouping & Smart Extraction

Right now, the agent just spits out 334 raw pixel coordinates. This is too
noisy. If a button is 100 pixels wide, and we step by 20 pixels, we log 5
separate "Hotspots" for the exact same button.

To make the Sweeper Agent truly intelligent, we need to upgrade it from a "Raw
Pixel Scanner" to a "Bounding Box Extractor."

We will update the Sweeper Agent to:

1. **Calibrate:** Automatically learn the "default" arrow serial before
sweeping.
2. **Sweep & Cluster:** Collect the raw points, but group adjacent points into
single logical "Elements."
3. **Capture & OCR:** Once it finds a discrete clickable element, take a
bounding-box screenshot of that exact cluster, run OCR on it, and output
exactly *what* text it found on that button/link.

Execute this block to deploy the **Advanced Sweeper Agent (V2)**:

```
[GEM-TERMINAL-START]
cat << 'EOF' | tr '~' ' ' > /opt/hauki-gem/sweeper_agent_v2.py
#!/opt/hauki-gem/venv/bin/python3
import requests, time, sys, math

API_URL = "[http://127.0.0.1:7711/api](http://127.0.0.1:7711/api)"

def move_mouse(x, y):

def get_cursor():

def ocr_region(x, y, w, h):

def cluster_points(points, threshold=30):

def smart_sweep(start_x, start_y, width, height, step=20):

if __name__ == "__main__":
EOF

chmod +x /opt/hauki-gem/sweeper_agent_v2.py
./sweeper_agent_v2.py
[GEM-TERMINAL-END]

```

This takes the pure "tactile" feedback of the mouse and fuses it with the
"visual" feedback of OCR. It clusters the raw points into continuous physical
buttons, grabs a screenshot of just that button, and tells you exactly what is
written on it!