How I actually use it
AI can search the web.
But it can't log into your tools.
Your work lives behind logins. AI can't get there. Browser automation fixes that.
The problem
Screenshot. Analyze. Click. Screenshot. Analyze. Click.
Current AI browser tools take a picture of your screen at every step. A 30-second task becomes 3 minutes.
Capture
~1,500 tokens
Analyze
Every pixel
Decide
Pattern match
Repeat
+1,500 tokens
The approach
Skip the screenshots.
Talk directly to Chrome.
CDP (Chrome DevTools Protocol) — the same tech that powers Chrome's developer tools. Read text, click elements, extract data. No images.
"Give me the text"
not
"Here's a photo — what does it say?"
Compare
Three ways AI touches the web
AI Web Search
- +Great for public information
- +Built into ChatGPT, Gemini, Perplexity
- −Can't access anything behind a login
- −Can't interact with pages
- −Can't fill forms or click buttons
Screenshot-Based
- +Can see any page
- +Works behind logins
- +Handles complex UIs
- −Slow — processes images each step
- −Expensive — high token usage
- −Fragile — visual changes break it
Direct Browser Control
- +Reads actual page data
- +Works behind logins
- +Fast — text is cheap to process
- +Precise — clicks exact elements
- −Needs a small setup script
Why it matters
Don't read the whole page.
Read only what you need.
Tokens = processing time = cost. Ask for exactly the data you need.
Same page. Six levels of precision. The accessibility tree gives you 90% of actionable info in 5% of the tokens.
Extraction
axtreeAccessibility tree (most compact)readableMain article content onlycontentAll visible text on the pagelinksAll clickable links and buttonsformsAll input fields with labelsInteraction
clickClick by selector or text contenttypeType into an input fieldscrollScroll the page or elementwaitWait for element to appearnavigateGo to a URLSpeed
blockSkip images, CSS, fonts (60-80% faster)cookies_saveSave login sessions to filecookies_loadRestore sessions (skip re-login)unblockClear resource blocksHow it works
One script between AI and Chrome
1,200 lines of Python. No framework.
You ask
'Check my Search Console for this week's clicks'
AI plans
Decides which pages to visit and what data to extract
Script executes
A small Python script sends commands to Chrome via CDP
Chrome acts
Navigates, clicks, reads — using your existing login session
Text returns
Clean data comes back. AI processes text, not pictures.
What the script actually does
# 1. Launch Chrome + block images/CSS (60-80% faster loads)
cdp.py ensure
cdp.py block
# 2. Restore yesterday's login session
cdp.py cookies_load ~/.cookies/google.json
# 3. Navigate (smart wait — no fixed sleep)
cdp.py navigate "https://search.google.com/search-console"
# 4. What's on this page? (~100 tokens, not 1,500)
cdp.py axtree
→ [navigation] [button "7d"] [button "28d"] [heading "Performance"]
# 5. Click the 28-day range
cdp.py click "28d"
# 6. Read the data
cdp.py readable
→ "Performance: 4,820 clicks | 72,100 impressions | 6.7% CTR"
# 7. Save session for next time
cdp.py cookies_save ~/.cookies/google.jsonIf it's slower than your hands, you won't use it.
3 minutes for what takes you 30 seconds? You'll just do it yourself. Every time. The bottleneck is speed, not intelligence.
27x
fewer tokens via accessibility tree
60-80%
faster loads with resource blocking
0
re-logins with session persistence