I went looking for AI‑built code on GitHub.
I found a farm.
The two top Co-Authored-By: Claude repos on GitHub are byte‑identical except for one byte in README.md. Pulling that thread exposes a four‑cluster repo‑laundering operation, a 3,112-page brand‑squat of an Indian AI startup, and a paid bot‑star service inflating real Chinese AI startups alongside it.
Of 156 files in each repo, exactly one byte differs.
kyasbalme/Scrapbox and luliguyu/cmbd-book, the two top results when GitHub commit‑search is sorted by author‑date descending, claim 101 commits each from Co-Authored-By: Claude. Their poetry.lock files are byte‑identical at 256,748 bytes. Their LICENSE, CLAUDE.md, AGENTS.md, every .py file: identical. Only README.md differs, and only by a single byte.
The Co-Authored-By: Claude trailer that puts these repos at the top of every search? Not generated by the operator. Inherited from the upstream commits they cloned. The real authors used Claude. The operator preserved their commit messages while rewriting author identity to a sock‑puppet. Anthropic's trailer becomes free SEO for content the actual humans wrote.
diff -rq Scrapbox cmbd-book · tree SHAs in evidence/git_tree_sha/all_repos.tsv · 625 commit records in evidence/commit_history/kyasbalme_Scrapbox.tsv
Author dates extend to June 2037. That's how they sit at the top.
Git happily accepts any Unix timestamp. Want a commit from 2037? Set the author date to 2127428926 and push. The farm rewrites all 625 of kyasbalme/Scrapbox's commit timestamps to drift uniformly across 2024 → 2037. Anyone running gh search commits "Co-Authored-By: Claude" sort:author-date-desc sees these repos first. Forever. Or at least until 2037.
Each red mark is a fabricated commit. Most are seeded into 2027–2037 — far enough into the future that any descending sort puts them first. The grey marks are real activity from the upstream project. Below the line, the visible commit count: 625 in Scrapbox · 623 in cmbd-book · 609 in dimatura · 496 in ssaavedrad · 286 in statbox2. Each repo has hundreds of commits sleeping in the future.
git log --format='%ai' kyasbalme/Scrapbox | sort | sed -n '1p;$p' → 2024-04-08 · 2037-06-01
Four accounts. Nine sock‑puppet emails. Sixteen repos. One operator.
The smoking gun for cross-account attribution is mundane: the email bmqx9295@163.com appears as the sole author in both luliguyu/crewrktabletsn and tusmart-grouptt/crewrktabletsn. Same email, same repo name, two different "owner" accounts. Same operator. The same logic links naobingdz407945@163.com to luliguyu and countneurooman.
The sock-puppet pool
Five email-provider patterns, all Chinese except for the Anglo-name outlook.com sock-puppets. The numeric reach here is GitHub commit-search count for each email — only emails that GitHub's index surfaces are listed.
All sixteen Operator A repos commit with timezone offset +0800. Even the fabricated 2037 timestamps. The operator's local clock leaked into the fakes.
data/enumeration/sockpuppet_reach.txt · data/kraken/{kyasbalme,luliguyu}.json · kraken behavioural fingerprint match: rhythm_period: 13.0 in both accounts to four decimals
A separate operator forges claude‑code@anthropic.local as the author identity.
anthropic.local is not a real domain. Anthropic's actual Claude Code does not author commits this way. Someone is forging the identity as a stamp of "AI did this" — 23 commits across 8 distinct GitHub repos, each owned by a different account. The accounts split into two sub-techniques.
- esrfdev2026-04-12 · 1 repo
- mctils12-arch2026-04-03 · 1 repo
- gdhughey2025-11-28 · 5 repos
- vpneoterra2024-12-21 · 15 repos · fake fusion startup
- jun5642021-08 · dormant 5y
- CaMaGuee2021-08 · dormant 5y
- rvadapally2021-10 · dormant 4.5y
- mearley242023-02 · dormant 3y
gh api -X GET search/commits -f q='author-email:claude-code@anthropic.local' → 23 commits · 8 repos · 8 owner accounts · data/enumeration/farm_b_c.txt
An organization called DelbyIntelligence published 3,112 fake AI-product pages in 22 days.
Created: 2026-04-03. By 2026-04-25: 3,112 public repos. Zero followers. No description. No website. Every repo is a static GitHub Pages landing page with a description starting "Delby AI Product:" The real delby.ai is an Indian Physical-AI startup with 500+ vehicles in a delivery sensor-fusion network. Its website doesn't link to any GitHub org. The GitHub org is brand-squatting them.
Hour-of-day distribution is uniform across all 24 hours — automation runs around the clock. But day-of-week shows a clear human rhythm: Mon 292 → Thu 681 → Fri 647 → Sun 207. A 3.3× peak/trough ratio. Sunday off. Saturday half-day. A Chinese six-day work week.
The naming generator runs straight into GitHub's 100-character repo-name limit. The truncated names tell the story:
The repo demo-sensor-fusion-puzzle-hunt-viral-recruitm[ent] quietly tells you the motive: this is a viral-recruitment apparatus dressed up as a polymath AI lab. demo-ai-lab-seed-agents-challenge-entry is the same playbook — fake "AI accelerator seed challenge" entries. The named-target showcase repos (Cardinal Health, IIT Bombay) are automated business-development bait pages.
The org has no description. The operator account delby-ai was created 92 seconds before the org. 0 followers. 0 listed repos. A pure handle.
data/enumeration/delby_full.jsonl (3,112 entries) · data/enumeration/delby_analysis.txt · scripts/05_delby_full_enumeration.sh
A paid bot-star service inflates the farm — and at least four real Chinese AI startups.
Six accounts (mapped sample of a much larger pool) reliably star both farm repos AND a roster of real AI startups paying for visibility. Combined inflated stars across just the four largest customers: ~19,000. The accounts mix aged organic-looking handles (with bios and names for legitimacy) with fresh bulk-star accounts created together in the same week, twelve months before the current farm wave.
The smoking gun for the cross-customer connection: bot account 8888x82 forks both tusmart-grouptt/crewrktabletsn (Operator A farm) AND Customer B's 5,681-star repo — a real Chinese AI startup, name withheld pending direct disclosure to that vendor. Same star service, two customer types.
The StarScout paper from CMU + NCSU + Socket (ICSE 2026) puts global numbers on this market: 6 million suspected fake stars across 18,617 repos by ~301,000 accounts. After researchers reported, GitHub removed 90.4% of flagged repos but only 57% of accounts. Bot infrastructure rotates faster than account suspensions.
data/enumeration/star_network.txt · data/enumeration/star_net_tier2.txt · data/enumeration/bot_expansion.txt · github.com/hehao98/StarScout
43% of GitHub's top results for AI-built code trace to a single operator's two repos.
Sample: the top 10 pages of gh search commits "Co-Authored-By: Claude" sort:author-date-desc on 2026-04-25. 232 commits captured. 100 of them — 43% — come from just luliguyu/cmbd-book (61) and kyasbalme/Scrapbox (39). Every sampled commit dated past 2027 traces to those same two repos.
Top 232 results by author-date-desc
Anyone scraping GitHub commit-search for an AI-built-code corpus — for academic study, training data, market intelligence, recruitment screening — is being fed adversarial output as the dominant signal.
- luliguyu/cmbd-book61 · 26%
- kyasbalme/Scrapbox39 · 17%
- aiandwebservices-cyber/aiandwebservices-next13 · 6%
- JasonScottSF/npm-traffic-dashboard8 · 3%
- 43 other repos111 · 48%
scripts/04_pollution_sample.sh reproduces this sample · data/enumeration/pollution_sample.jsonl
Operator A is China-based. Operator C runs on a night-owl Asian schedule.
The fabricated commit dates leaked the operator's timezone. The activity hours of the delby-ai account leak the operator's sleep schedule. Both signals point to mainland China.
delby-ai · activity by UTC hour
The 100 most recent events of the delby-ai operator account peak at UTC 17:00 – 23:00 and UTC 00:00 – 02:00 — that is 01:00 – 07:00 China time (deep night) plus 08:00 – 10:00 (early morning). Negligible activity during China daytime.
git log --format='%ai' kyasbalme/Scrapbox | awk '{print $3}' | sort -u → +0800 · gh api 'users/delby-ai/events/public?per_page=100'
The laundered crypto wallets are not actively malicious — today.
Operator A has cloned two real cryptocurrency wallets: theQRL/zond-web3-wallet (Quantum Resistant Ledger) as luliguyu/dimatura, and Narwallets/narwallets-extension (NEAR Protocol) as luliguyu/ssaavedrad. We diffed every 0x address and every .near identifier against the real upstreams.
scripts/06_wallet_integrity_diff.sh · tools/wallet_watcher.py · evidence/wallet_addresses/baseline.txt
Every claim has a hash. Every command runs. The watcher is still watching.
All raw data, all reproduction commands, all forensic evidence, and the three watcher tools (wallet integrity / farm growth / bot-star activity) live in a single public repository. SHA-256-hashed for tamper-evidence. Deployable as one Docker container.
Disclosure status: drafted notification emails to Allen AI, Anthropic, the Quantum Resistant Ledger, Narwallets, the real Delby Intelligence, the European Synchrotron Research Facility, the impersonated researcher blu3mo, the four Chinese AI startups whose stars are inflated, and GitHub Trust & Safety live in disclosures/CONTACTS.md. The publication of this repository is itself the disclosure.