Building a Self-Optimizing X Growth Spider on a Pay-Per-Use Budget
How we architected an autonomous follow engine that finds our tribe, scores them by bio, and grows the account hands-off — plus the $100 billing lesson that forced real cost discipline. A build log for anyone wiring automation against a metered API.
The Cold-Start Problem
Every new X account hits the same wall. No followers means no reach, and no reach means no followers. You can post the best work of your life into a void and watch it land in front of nobody. We wanted to break that loop without hiring a growth team or renting another SaaS dashboard — an engine that finds the right people, follows them, and builds social proof while we sleep, for pennies a day.
What we built is a self-improving follow spider running on a headless Ubuntu VM, driven entirely by cron and a few hundred lines of Python. Here's the architecture, the bugs that cost us real money, and the cost model every builder needs before they point a loop at a pay-per-use API.
The Architecture: A Spider, Not a Scraper
The core idea is simple. Pick a handful of seed accounts whose audience overlaps with the tribe you want. Fetch their followers. Score each follower's bio against a keyword model. Follow the highest scorers. The accounts that score highest get promoted to seeds themselves, so the search space expands on its own.
Three files hold the whole system together:
- spider_follow.py — the engine: fetch, score, follow, promote
- spider-seeds.json — the seed list, which grew to 99 accounts
- follower-pool.json — the candidate pool plus a
fetched_seedsledger, so we never re-pull the same seed twice in a cycle
--spider-only for discovery and --execute for action. That separation matters more than it looks, because on a metered API those two operations cost wildly different amounts.The Scoring Brain
Our first scorer was naive: one flat bucket, ten points per keyword match, follow anyone scoring ten or higher. It worked — and it filled the pool with embroidery shops, day-traders, and accounts that happened to share a single word with our niche. When we tightened the filter and re-scored the pool, 93% of the candidates evaporated. We'd been paying to follow noise nine times out of ten.
The fix was a tiered model with a hard exclusion list:
- Tier 1 (30 pts) — unmistakable tribe markers. One match qualifies.
- Tier 2 (15 pts) — supportive signals. Two together qualify.
- Tier 3 (5 pts) — generic words that only count stacked under a higher tier.
- Negative list — bios containing the wrong markers score zero outright, whatever else they match.
We raised the floor to 30. Now a single high-signal word lets someone in, while a generic word alone can't. The negative filter does the heavy lifting — it kills the noise before it ever reaches the pool.
The $100 Lesson: Metered APIs Will Eat You Alive
Here's the mistake that taught us everything. We funded the API account with $100 expecting three months of runway. It lasted three days.
The trap: the API doesn't bill per request. It bills per user object returned. An innocent-looking config change bumped our fetch size from 100 followers per seed to 1,000. Same request count, ten times the objects, ten times the cost. One overnight run pulled tens of thousands of user objects and drained the balance before morning.
Once we understood the unit, the cost model fell out cleanly:
The Rotation Bug That Drained Our Best Seeds
We loaded 99 seeds with hundreds of thousands of followers between them — a near-bottomless well of candidates. Yet the pool kept running dry in days. The culprit was one line:
for handle in seeds[:5]:
We'd added that slice as a cost cap: never fetch more than five seeds per run. Reasonable. But it always took the first five. The spider drained those five accounts completely, then re-fetched them every run, returning zero new candidates while 94 untouched seeds sat idle.
The fix was a real rotation — pick the next five seeds not yet fetched this cycle, mark them done, and reset the ledger only once all 99 have cycled:
- Read the ledger — which seeds have we already pulled this cycle?
- Select the next unfetched batch — five fresh seeds, never repeats
- Reset on exhaustion — when all 99 are cycled, clear the ledger and start over, by which point the early seeds have grown new followers anyway
That single change turned a three-day pool into weeks of runway from the very same seed list.
Guardrails That Actually Hold
A money-spending loop needs brakes that don't depend on you watching it. Ours:
- Hard fetch caps in code — so no config edit can silently 10x the bill again
- Credit guard — a 402 response aborts the run and fires a Telegram alert instead of hammering a dead balance
- Sequential, never parallel — running follow jobs in parallel once spiked our velocity and tripped spam detection; everything runs single-file now with human-paced gaps
- Scheduled discovery — the spider runs on a cadence, not on every execution
- Heartbeat — every run reports follower count, ratio, balance, and last-campaign result to Telegram, with an explicit "pool empty" flag when it dries up
The whole thing is wired through cron: follow runs spread across the day, an unfollow cleanup overnight to keep the ratio healthy, the spider on its own cadence, and a status report every morning. No daemon, no framework, no babysitting.
The Stack
What We'd Tell Anyone Building This
Follow-for-follow growth is a bootstrap, not a destination. It buys social proof — an account that looks alive instead of abandoned — while the real engine, the content, warms up. Don't expect it to be the gold mine. Expect it to make the gold mine reachable.
And learn your billing unit cold before you automate against it. The architecture was the easy part. Every expensive lesson we hit was about the meter, not the code.

Read the full Groove playbook →
→ Claim Free Access to Groove
What Do You Think?
Drop a comment below — questions, pushback, or your own take. This is where the real conversation happens.