Download Game! Currently 97 players and visitors. Last logged in:EmboGalronioSarmaDefault

Blitzer's Blog >> 70992

Back to blogs index
Posted: 11 Jun 2026 10:33 [ permalink ]
The Sweeper Agent hit 7,158 interactive pixels, but it *still* clustered them
all into a single giant 1410x1170 bounding box!

Your intuition in the previous message was completely correct:

> *"I think the fact that it merged them into one big sidebar is actually a
WIN - since this GUI in my mind would seem *DIFFICULT* to make the AI/bot
cognitively recognize the left sidebar as one big block... I think we WANT it
to be recognized as one big blob, BUT it should also via OCR possibly
recognize, that 'hey, these are repeating templates'."*

You are describing **Hierarchical Layout Analysis**. The tactile probe (the
Sweeper) tells us where the *container* is (the sidebar is one continuous
clickable zone because the rows are packed tightly together). But tactile
feedback alone isn't enough to parse the *items* inside the container.

To solve this, we need to apply the logic you outlined:

> *"If we make it generalize ALL actual text contents into "TEXT" and just
analyze the boldness/styling, than it should recognize repeating
templates/patterns? ... 'IF BLOB HAS REPEATING SUB-BLOCKS AND SUB-BLOCK HAS
POSSIBLE NAME/TITLE AND PHONE NUMBER' -> treat it as a list of contacts"*

### The Solution: Combining `engine_layout.py` with Semantic Heuristics

We already built `engine_layout.py` which uses OpenCV Canny Edge Detection to
draw the `+---+` boxes.

If we run Edge Detection specifically on the $1410 \times 1170$ tactile blob
that the Sweeper just found, OpenCV will easily see the horizontal borders
between each lead row (since your Tailwind CSS includes `border-b` between the
cards).

Then, as you suggested, we can look at the OCR within those sub-boxes. If the
boxes have identical geometry and the OCR consistently returns a 10-digit
number (a phone number) and a 7-digit ID (a Y-tunnus), the system can
mathematically declare: **"This is a List."**

Let's test this theory.

Since the Sweeper gave us the giant box `X:0, Y:22, W:1410, H:1170`, let's run
our ASCII `engine_layout.py` on that exact region to see if the Edge Detection
natively sees the individual lead rows that the Sweeper missed.

Execute this command:

```
[GEM-TERMINAL-START]
import sys
sys.path.append('/opt/hauki-gem')
from engine_layout import generate_ascii_map

# We use the coordinates the Sweeper just found
# We increase the grid resolution to 120x60 to get more detail
ascii_map = generate_ascii_map(x=0, y=22, w=1410, h=1170, grid_cols=120,
grid_rows=60)
print(ascii_map)
" | tr ' ' '%'
[GEM-TERMINAL-END]

```

If OpenCV sees the horizontal divider lines between the CRM contacts, the
ASCII output will show a vertical stack of distinct rectangles.

Look closely at the ASCII output and let me know if you see the "repeating
template" structure emerging!