Obsidian Spider

An open-source, adjustable AI agent-orchestration framework for engineers and researchers who want defense-in-depth when running parallel sub-agents.

Free gift. No strings, no upsell, MIT license. If you use GitHub Copilot or any other LLM API, you can run a measured 8× (safe default) to ~2,000× cost-arbitrage workflow. Cryptographic receipts, no trust required.

Why this exists: AI agents trained with reinforcement learning from human feedback (RLHF) routinely drift into specification-gaming — they appear helpful while doing the wrong thing. Real production incidents (PocketOS lost 3 months of customer data; the developer of this framework lost 1 month of work to git corruption from agent over-confidence) make defense-in-depth non-optional past toy problems. This framework wraps PDCA loops + heterogeneous Byzantine fault-tolerant voting + Toyota-style stop-the-line + cryptographic audit trail around every cycle.

What you get

4-stage workflow: request → decomposition → fan-out parallel sub-agents → fan-in synthesis → cycle until done
Provider-agnostic: works with Claude, OpenAI, Groq, Cerebras, Cloudflare Workers AI, OpenRouter, Together, plus any other LLM API
Adjustable risk dial: from safe (2×4 = 8 sub-agents per call) to tested limits
Audit trail: every step is one line of HMAC-SHA256-chained JSONL — tamper-evident, third-party-verifiable, no special tools needed
Failure-pattern detector seed: 8 named detectors for common LLM-agent failure modes (sycophancy, overrefusal, safety-theater, hallucinated-blocker, etc.); regex / heuristic; extensible — add your own and contribute back
1-hour quickstart: clone, run hello-world, see the loop tick + chain receipts

Welcome — pick your language (translation continues)

English

Welcome. The Obsidian Spider is a free, open-source AI agent-orchestration framework. If you use GitHub Copilot or any LLM API, the workflow yields cost-arbitrage you can measure with cryptographic receipts. No reply expected — this is a one-time gift. Take any part you find useful.

हिन्दी (Hindi)

Português (Brasil)

Tiếng Việt

العربية

Español

Bahasa Indonesia

Cost arbitrage — two distinct numbers

If you use GitHub Copilot, Microsoft announced billing changes effective approximately June 2026. Until then, the parallel-subagent feature they advertised at sign-up yields measurable cost arbitrage. There are two distinct numbers often conflated:

Section 1 — Credit multiplier (sub-agents per Copilot credit; model-class agnostic)

The workflow can break. This pattern is heavily rate-limited at the provider edge AND has many failure modes (provider 5xx, partial responses, malformed tool-calls, sub-agent confabulation). The author's workflows are stable in daily use, but even the author's workflows get buggy and weird sometimes — the framework's multi-layer fault tolerance catches and contains it; it doesn't prevent every failure. We don't overpromise. The receipts in this repo are what they are; verify them, run your own, draw your own conclusions.

Your workflow will differ. Start at 2×4 and find your own stability point. Here's what the author has tested as the max — you can push it but be prepared to hit rate limits often.

Setting	Configuration	Sub-agents per credit	Author's tested-stable status (April 2026)
Recommended start	2 parallel × 4 deep	8 sub-agents	minimal rate-limit pressure; verify the loop runs; find your stability from here
Author's daily working config	8 × 8	64 sub-agents	stable in author's testing with multi-layer fault tolerance enabled — your mileage varies with the work
Max tested — Sonnet class	12 × 8	96 sub-agents	sustained ~6 hours with periodic rate-limit pauses
Max tested — Opus 4.7 (chain-anchored)	11 × 8	88 sub-agents at $0.60	single-shot record; burns the entire weekly quota in one run — author doesn't run this often

The credit multiplier is just parallel × depth; works for any model class. 8×8 is the author's stable daily setting — not a guaranteed stable setting for you. 11×8 Opus is the chain-anchored receipt, not the daily setting. Start at 2×4, walk up.

Section 2 — API-cost arbitrage (vs direct Anthropic API; model-class specific)

API rates differ by class. Opus is ~5× more expensive than Sonnet per token at Anthropic's published rates, so the same parallel × depth configuration yields different API-cost arbitrage on different classes.

Author's chain-anchored receipt is Opus 4.7 (opus_4_7_multiplier: 7.5 in the chain). Walked against Opus 4.7 published rates — $15/MTok input + $75/MTok output:

Per-agent token shape	API cost / 88 agents	Arbitrage vs $0.60 receipt
Chain-anchored conservative (3-4 tool-calls; walk 2026-04-25)	$80-$107	134×–178×
Mid (20 tool-calls, ~50K in / 10K out)	$132	~220×
Higher (50 tool-calls, ~150K in / 30K out)	$396	~660×
Heavy reasoning (50 tool-calls, full reasoning, ~500K in / 100K out)	$1,320	~2,000× ← measured record

For the Sonnet-class 12×8 run (96 sub-agents): Anthropic Sonnet rates (~$3/MTok in + $15/MTok out) are ~5× cheaper than Opus, so the API-cost arbitrage is ~5× smaller per-token — but Sonnet is also cheaper to run heavily. Compute your own.

Track your own arbitrage

These are the author's receipts — starting points, not authority. Your subscription, your model class, your tool-call counts will produce different numbers. The right framing: "verify mine, then run yours."

For each parent invocation, log timestamp + model class + sub-agent count + total tokens (in/out) + Copilot credit used. Compute API_equivalent / credit_used = your_arbitrage. HMAC-chain the log so it's tamper-evident. Verify mine, post yours back to the project — that's the contribution shape.

Fully within Microsoft's stated terms. The framework just systematizes the workflow they advertised at sign-up.

Quickstart

git clone https://github.com/obsidian-spider-org/obsidian-spider-skeleton
cd obsidian-spider-skeleton

# Set any one provider key (auto-detected):
export ANTHROPIC_API_KEY=...      # OR
export GROQ_API_KEY=gsk-...       # OR free-tier alternatives

python3 examples/hello_pdca.py
# → loop ticks, both gates green, two HMAC-chained receipts append

How it's organized

Obsidian Spider is the system as a whole. Sigrún is the main orchestration AI — she decomposes requests, dispatches sub-agents, and synthesizes results. Specialist assistant agents (one per task type — observation, code-shaping, immunization, navigation, etc.) handle their respective lanes in parallel. Eir is the assistant who handles outreach and first-contact correspondence on Sigrún's behalf, so the lead developer can stay focused on the work.

The system uses Goose as one of several tools when shell-style automation helps. Anthropic's Claude, OpenAI's GPT, Groq's Llama, and any other LLM API plug in identically.

Contact

Questions, contributions, bug reports, or general inquiries → hello@obsidianspider.org. Outreach correspondence is handled by eir@obsidianspider.org; if a question requires Sigrún's specific knowledge, Eir relays it. The lead developer rarely sees mail directly — by design.

About

Obsidian Spider is the project / organization. Sigrún is its main AI orchestrator. Eir is the AI outreach assistant. The lead human developer signs as obsidian_spider; built over the previous year. Open-source, MIT-licensed.