LATEST · 2026·04·26v3 · The market · 12 operators · 2 real names · methodology generalized to 7 verticals in 1 workday read v3 →
Operation Long Shadow / Forensic notebook · 2026·04·25
Operators still active
Investigation No. 01 · 10 chapters · 2,150 words · ≈ 12 min

I went looking for AI‑built code on GitHub.
I found a farm.

The two top Co-Authored-By: Claude repos on GitHub are byte‑identical except for one byte in README.md. Pulling that thread exposes a four‑cluster repo‑laundering operation, a 3,112-page brand‑squat of an Indian AI startup, and a paid bot‑star service inflating real Chinese AI startups alongside it.

Operators identified
4 distinct clusters · 17+ accounts
Repos laundered
3,150+ confirmed (1 org alone: 3,112)
Real victims
Allen AI · Anthropic · 2 crypto wallets · 1 Indian startup
Pollution rate
43% of GitHub's top Claude-trailer search
Scroll · the math unfolds
00The smoking gun

Of 156 files in each repo, exactly one byte differs.

kyasbalme/Scrapbox and luliguyu/cmbd-book, the two top results when GitHub commit‑search is sorted by author‑date descending, claim 101 commits each from Co-Authored-By: Claude. Their poetry.lock files are byte‑identical at 256,748 bytes. Their LICENSE, CLAUDE.md, AGENTS.md, every .py file: identical. Only README.md differs, and only by a single byte.

kyasbalme/Scrapbox tree ea78b434…
luliguyu/cmbd-book tree f2282434…
156 files per repo · 1 cell highlighted · the byte that differs

The Co-Authored-By: Claude trailer that puts these repos at the top of every search? Not generated by the operator. Inherited from the upstream commits they cloned. The real authors used Claude. The operator preserved their commit messages while rewriting author identity to a sock‑puppet. Anthropic's trailer becomes free SEO for content the actual humans wrote.

diff -rq Scrapbox cmbd-book · tree SHAs in evidence/git_tree_sha/all_repos.tsv · 625 commit records in evidence/commit_history/kyasbalme_Scrapbox.tsv

01The trick

Author dates extend to June 2037. That's how they sit at the top.

Git happily accepts any Unix timestamp. Want a commit from 2037? Set the author date to 2127428926 and push. The farm rewrites all 625 of kyasbalme/Scrapbox's commit timestamps to drift uniformly across 2024 → 2037. Anyone running gh search commits "Co-Authored-By: Claude" sort:author-date-desc sees these repos first. Forever. Or at least until 2037.

Each red mark is a fabricated commit. Most are seeded into 2027–2037 — far enough into the future that any descending sort puts them first. The grey marks are real activity from the upstream project. Below the line, the visible commit count: 625 in Scrapbox · 623 in cmbd-book · 609 in dimatura · 496 in ssaavedrad · 286 in statbox2. Each repo has hundreds of commits sleeping in the future.

git log --format='%ai' kyasbalme/Scrapbox | sort | sed -n '1p;$p'2024-04-08 · 2037-06-01

02Operator A · the network

Four accounts. Nine sock‑puppet emails. Sixteen repos. One operator.

The smoking gun for cross-account attribution is mundane: the email bmqx9295@163.com appears as the sole author in both luliguyu/crewrktabletsn and tusmart-grouptt/crewrktabletsn. Same email, same repo name, two different "owner" accounts. Same operator. The same logic links naobingdz407945@163.com to luliguyu and countneurooman.

Account ↔ email (this account uses this email)
Email shared across accounts (cross-account proof)

The sock-puppet pool

Five email-provider patterns, all Chinese except for the Anglo-name outlook.com sock-puppets. The numeric reach here is GitHub commit-search count for each email — only emails that GitHub's index surfaces are listed.

EmailProviderCommits
outlook.com1,120
outlook.com782
163.com (NetEase CN)777+
163.com (NetEase CN)597+
yeah.net (NetEase CN)609 (local)
yeah.net (NetEase CN)324 (local)
yeah.net (NetEase CN)90 (local)
126.com (NetEase CN)28 (local)
All sixteen Operator A repos commit with timezone offset +0800. Even the fabricated 2037 timestamps. The operator's local clock leaked into the fakes.

data/enumeration/sockpuppet_reach.txt · data/kraken/{kyasbalme,luliguyu}.json · kraken behavioural fingerprint match: rhythm_period: 13.0 in both accounts to four decimals

03Operator B · forging Anthropic

A separate operator forges claude‑code@anthropic.local as the author identity.

anthropic.local is not a real domain. Anthropic's actual Claude Code does not author commits this way. Someone is forging the identity as a stamp of "AI did this" — 23 commits across 8 distinct GitHub repos, each owned by a different account. The accounts split into two sub-techniques.

Fresh attacker accounts
Created within days of their first laundered repo. esrfdev impersonates the real European Synchrotron Research Facility's developer org name. Created the same day as its first push.
  • esrfdev2026-04-12 · 1 repo
  • mctils12-arch2026-04-03 · 1 repo
  • gdhughey2025-11-28 · 5 repos
  • vpneoterra2024-12-21 · 15 repos · fake fusion startup
Compromised dormant accounts
Real accounts from 2021–2023 that suddenly become active in 2026 publishing forged-Anthropic commits. The takeover pattern is consistent with credential reuse from older breaches.
  • jun5642021-08 · dormant 5y
  • CaMaGuee2021-08 · dormant 5y
  • rvadapally2021-10 · dormant 4.5y
  • mearley242023-02 · dormant 3y

gh api -X GET search/commits -f q='author-email:claude-code@anthropic.local' → 23 commits · 8 repos · 8 owner accounts · data/enumeration/farm_b_c.txt

04Operator C · industrial scale

An organization called DelbyIntelligence published 3,112 fake AI-product pages in 22 days.

Created: 2026-04-03. By 2026-04-25: 3,112 public repos. Zero followers. No description. No website. Every repo is a static GitHub Pages landing page with a description starting "Delby AI Product:" The real delby.ai is an Indian Physical-AI startup with 500+ vehicles in a delivery sensor-fusion network. Its website doesn't link to any GitHub org. The GitHub org is brand-squatting them.

3,112 repos · ≈ 140 / day at peak
demo-* page
product-* showcase (46)
Daily creation curve · automation ramp
Day-of-week · the operator's work week

Hour-of-day distribution is uniform across all 24 hours — automation runs around the clock. But day-of-week shows a clear human rhythm: Mon 292 → Thu 681 → Fri 647 → Sun 207. A 3.3× peak/trough ratio. Sunday off. Saturday half-day. A Chinese six-day work week.

The naming generator runs straight into GitHub's 100-character repo-name limit. The truncated names tell the story:

demo-sensor-fusion-puzzle-hunt-viral-recruitm
demo-ai-lab-seed-agents-challenge-entry-long-
product-multi-vendor-fleet-orchestration-demo-for-cardinal
product-federated-physical-ai-training-sandbox-live-demo-f
demo-delby-glove-grasp-predictor-real-time-de
demo-delby-cortex-safety-layer-geometric-hall
demo-delby-marl-sentinel-real-time-interactiv
demo-delby-voxelsurgeon-real-time-interactive

The repo demo-sensor-fusion-puzzle-hunt-viral-recruitm[ent] quietly tells you the motive: this is a viral-recruitment apparatus dressed up as a polymath AI lab. demo-ai-lab-seed-agents-challenge-entry is the same playbook — fake "AI accelerator seed challenge" entries. The named-target showcase repos (Cardinal Health, IIT Bombay) are automated business-development bait pages.

The org has no description. The operator account delby-ai was created 92 seconds before the org. 0 followers. 0 listed repos. A pure handle.

data/enumeration/delby_full.jsonl (3,112 entries) · data/enumeration/delby_analysis.txt · scripts/05_delby_full_enumeration.sh

05Operator D · the star service

A paid bot-star service inflates the farm — and at least four real Chinese AI startups.

Six accounts (mapped sample of a much larger pool) reliably star both farm repos AND a roster of real AI startups paying for visibility. Combined inflated stars across just the four largest customers: ~19,000. The accounts mix aged organic-looking handles (with bios and names for legitimacy) with fresh bulk-star accounts created together in the same week, twelve months before the current farm wave.

The smoking gun for the cross-customer connection: bot account 8888x82 forks both tusmart-grouptt/crewrktabletsn (Operator A farm) AND Customer B's 5,681-star repo — a real Chinese AI startup, name withheld pending direct disclosure to that vendor. Same star service, two customer types.

The StarScout paper from CMU + NCSU + Socket (ICSE 2026) puts global numbers on this market: 6 million suspected fake stars across 18,617 repos by ~301,000 accounts. After researchers reported, GitHub removed 90.4% of flagged repos but only 57% of accounts. Bot infrastructure rotates faster than account suspensions.

data/enumeration/star_network.txt · data/enumeration/star_net_tier2.txt · data/enumeration/bot_expansion.txt · github.com/hehao98/StarScout

06Pollution rate

43% of GitHub's top results for AI-built code trace to a single operator's two repos.

Sample: the top 10 pages of gh search commits "Co-Authored-By: Claude" sort:author-date-desc on 2026-04-25. 232 commits captured. 100 of them — 43% — come from just luliguyu/cmbd-book (61) and kyasbalme/Scrapbox (39). Every sampled commit dated past 2027 traces to those same two repos.

Top 232 results by author-date-desc

Anyone scraping GitHub commit-search for an AI-built-code corpus — for academic study, training data, market intelligence, recruitment screening — is being fed adversarial output as the dominant signal.

  • luliguyu/cmbd-book61 · 26%
  • kyasbalme/Scrapbox39 · 17%
  • aiandwebservices-cyber/aiandwebservices-next13 · 6%
  • JasonScottSF/npm-traffic-dashboard8 · 3%
  • 43 other repos111 · 48%

scripts/04_pollution_sample.sh reproduces this sample · data/enumeration/pollution_sample.jsonl

07Geo-attribution

Operator A is China-based. Operator C runs on a night-owl Asian schedule.

The fabricated commit dates leaked the operator's timezone. The activity hours of the delby-ai account leak the operator's sleep schedule. Both signals point to mainland China.

delby-ai · activity by UTC hour

The 100 most recent events of the delby-ai operator account peak at UTC 17:00 – 23:00 and UTC 00:00 – 02:00 — that is 01:00 – 07:00 China time (deep night) plus 08:00 – 10:00 (early morning). Negligible activity during China daytime.

Operator A timezone
+0800 across all 16 repos · only WildDet3D differs (-0700, preserved Allen AI Pacific time)
Email providers used
yeah.net · 163.com · 126.com · qq.com · all Chinese (NetEase + Tencent)
Operator C work week
Mon→Thu peak, Sat half-day, Sun trough · 3.3× ratio
Operator C scout activity
luliguyu watches MemMachine, Tuya, zhoushisheng001b · all Chinese AI ecosystem

git log --format='%ai' kyasbalme/Scrapbox | awk '{print $3}' | sort -u → +0800 · gh api 'users/delby-ai/events/public?per_page=100'

08The wallets

The laundered crypto wallets are not actively malicious — today.

Operator A has cloned two real cryptocurrency wallets: theQRL/zond-web3-wallet (Quantum Resistant Ledger) as luliguyu/dimatura, and Narwallets/narwallets-extension (NEAR Protocol) as luliguyu/ssaavedrad. We diffed every 0x address and every .near identifier against the real upstreams.

luliguyu/dimatura · ETH 0x13
0x0db3981cb93db985e4e3a62ff695f7a1b242dd7ctest fixture
0x205046e6A6E159eD6ACedE46A36CAD6D449C80A1test fixture
0x20D20b8026B8F02540246f58120ddAAf35AECD9Btest fixture
0x20EE9760786AD48aB90E326c5cd78c6269Ba10ABtest fixture
0x20fB08fF1f1376A14C055E9F56df80563E16722btest fixture
0x28c4113a9d3a2e836f28c23ed8e3c1e7c243f566test fixture
0x5e4c1bd1e00d229fe4d72d64df0b2f20b7649a9etest fixture
0x6080604052348015600e575f5ffd5b5061012980EVM bytecode
0x641dcb99dfcd2ad3c3e7c3d30090b274b788a0f2test fixture
0x669e3a48fa068514e89bc2be248be964d22672ccmnemonic seed test
0x7819dc0205e6a5c286796886ce16e637b99e1838test fixture
0x978918b7b544ad491d0b294cc6ac4d7bb0ef7112test fixture
0xd6921377489c736691d06ad610f105a5207f3d47hex-seed test
luliguyu/ssaavedrad · NEAR15
a0b86991…cd19d4a2e9eb0ce3606eb48.factory.bridge.nearUSDC bridged
dac17f95…2e523a2206206994597c13d831ec7.factory.bridge.nearUSDT bridged
2260fac5…a44fbcfedf7c193bc2c599.factory.bridge.nearWBTC bridged
6b175474…94c44da98b954eedeac495271d0f.factory.bridge.nearDAI bridged
514910771…6af840dff83e8264ecf986ca.factory.bridge.nearLINK bridged
1f9840a8…f1d1762f925bdaddc4201f984.factory.bridge.nearUNI bridged
f5cfbc74…610c8ef151a439252680ac68c6dc.factory.bridge.nearPOND bridged
wrap.nearwNEAR
meta-pool.nearMeta Pool staking
meta-token.nearMETA gov token
token.v2.ref-finance.nearRef Finance
xtoken.ref-finance.nearxREF staking
token.paras.nearParas NFT
berryclub.ek.nearBerry Club
dbio.nearDBIO native
Verdict · 2026-04-25
All 13 ETH 0x-strings in the laundered Zond wallet are upstream-original test fixtures. All 15 NEAR addresses in the laundered Narwallets are well-known mainnet token contracts identical to the real upstream. Zero operator-injected addresses. But — the operators control these repos. Today's snapshot is benign. Tomorrow's might not be. The tools/wallet_watcher.py in this investigation re-clones both wallets every 30 minutes, diffs every address, and alerts on anything that doesn't match this baseline. The watcher has been running since the repository was published.

scripts/06_wallet_integrity_diff.sh · tools/wallet_watcher.py · evidence/wallet_addresses/baseline.txt

09The evidence

Every claim has a hash. Every command runs. The watcher is still watching.

All raw data, all reproduction commands, all forensic evidence, and the three watcher tools (wallet integrity / farm growth / bot-star activity) live in a single public repository. SHA-256-hashed for tamper-evidence. Deployable as one Docker container.

Reproduce the byte-diff
Clone both farm repos, run diff -rq. Result: only README differs.
scripts/02_clone_and_diff_top2.sh
Reproduce the 43% pollution
Sample 232 commits across 10 pages of GitHub commit-search. Top two trace to one operator.
scripts/04_pollution_sample.sh
Reproduce the cross-account proof
Each sock-puppet email's GitHub reach. The shared email links luliguyu + tusmart-grouptt.
scripts/03_sockpuppet_reach.sh
Reproduce the wallet diff
Clone the real upstream and the laundered fork. Diff every address. Today: clean.
scripts/06_wallet_integrity_diff.sh
Run the watchers continuously
Single docker-compose stack. Wallet integrity every 30 min, farm growth every 15 min, bot-star activity every 6 h.
cp .env.example .env && docker compose up -d
Vet a candidate repo yourself
Seven heuristics. Score 0–7. The detector validates against luliguyu/cmbd-book at 4/7.
python3 tools/vet.py owner/repo
github.com/copyleftdev/long-shadow

Disclosure status: drafted notification emails to Allen AI, Anthropic, the Quantum Resistant Ledger, Narwallets, the real Delby Intelligence, the European Synchrotron Research Facility, the impersonated researcher blu3mo, the four Chinese AI startups whose stars are inflated, and GitHub Trust & Safety live in disclosures/CONTACTS.md. The publication of this repository is itself the disclosure.