Toward compounding wisdom in human-AI pairs.
Most AI research measures how good the model became. Very few measure how capable, calibrated, or wiser the human became — especially in the moments when AI is no longer in the room. This research line is built around that gap.
Research overview is the public map. Research Line is the operating evidence system.
Start with Research when you want the plain-language overview: theory, model, metrics, Student Sandbox readiness, and source notes. Use this Research Line when you need the research protocol: preregistration, session review packets, evidence ledger, method commitments, and the gate that turns findings into product changes.
The long-horizon question this line exists to answer.
Under what conditions does a human-AI pair accumulate compounding wisdom — not merely accelerated output, not merely personalized convenience — over months and years?
What we can actually measure on the way there.
The north star is not directly testable. These two questions are.
Does the human perform measurably better when AI is absent?
After a scaffolded AI-assisted loop, the human is given a similar task without AI access. We compare to a no-AI baseline and a cold-chat-AI baseline. The reverse of the cognitive-offloading concern.
Does trust track evidence quality rather than fluency?
Over repeated cycles, does the human delegate the right parts and retain the right parts; does over-reliance shrink; does the human get better at saying "the AI is wrong here"?
Three proof beds, three boundary conditions.
Each sub-line has a clear study unit, a baseline condition, and the boundaries it primarily tests. We move sub-lines forward when evidence accumulates, not when arguments accumulate.
Student Sandbox
Does a 20-minute scaffolded human-AI loop produce a measurable capability-without-AI delta vs (i) no-AI and (ii) cold-chat baselines?
N = 0 real trialsTrading-agent
Does AI-augmented decision-making with explicit boundaries produce decisions the human can defend better, calibrate trust better, and learn from outcomes better over time?
running · co-evolution lens not yet appliedPersonal knowledge
Does long-horizon human-AI memory interaction (Obsidian + TrustMem) produce compounding reflection capacity, or does it produce stale personalization and dependence?
methodology not pinnedThe rules that turn a theory document into a research line.
- Pre-register predictions before each session. Five-minute single-page document committing to what we expect to see. Stored at
docs/research-line/preregistration/. - Two raters when feasible. Observer + a second reviewer independently score the session using
self-evolution-metrics-v0.md. Track inter-rater agreement even at N=1. - De-identified review packets are the data. Every session produces one review packet. Public publication is required (de-identified). They are the corpus, not folder-dust.
- Negative results are first-class. If a session fails the prediction, write it up. Failure-to-publish-negatives is the single most common research-line corruption.
- Be explicit about N. N=1 case studies are legitimate but must be labeled. Never write "students tend to…" at N ≤ 5.
- No instrument inflation. Resist adding metrics ad-hoc. New metrics require a quarterly synthesis to justify.
What surrounds this work, and what is left empty.
The full anchor atlas (academic + industry + products + voices) — ~30 anchors across 6 buckets, each with our positioning — lives at research-line-atlas (also at docs/research-line/anchor-atlas.md). Summary positioning:
| Tradition | Closest anchor | Where NOUS OS adds |
|---|---|---|
| Augmentation | Engelbart 1962 · Bush 1945 | LLM-era boundary taxonomy + capability-delta instrument |
| Cognitive offloading | Risko & Gilbert 2016 · Storm & Stone | Reverse question: when does offloading make people stronger? |
| Self-regulated learning | Zimmerman · Vygotsky ZPD | AI-native instantiation of forethought → performance → reflection |
| AI literacy | Long & Magerko 2020 · UNESCO 2024 | From descriptive taxonomy to instrumented loop |
| Tools for thought | Matuschak & Nielsen 2019 · Bret Victor | From individual cognition tools to explicit symbiosis |
| Practical AI advice | Mollick · Co-Intelligence 2024 | From descriptive heuristics for adults to prescriptive measured protocol |
| Industry products | Khanmigo · NotebookLM · Cursor · Claude Code | Product-agnostic protocol layered on any AI surface |
The position the literature is most empty at: measuring whether a person is more capable when AI is absent. That is where this line is most concentrated.
How fresh thinking enters the system every day.
The north star cannot be served by internal work alone. NOUS OS must continuously absorb external thinking — academic preprints, industry research, top products, individual essays and podcasts — without drowning in firehose noise. Three-tier discipline:
Daily
Scheduled remote agent pulls ~5–8 narrow high-signal sources, filters by anchor keywords, writes raw daily inbox to docs/research-line/inbound/_inbox/.
Weekly
Scheduled agent (or human) promotes 1–3 candidates per week to full 1-page inbound notes. Each note has an HTML mirror and joins the public corpus.
Quarterly
Human + Claude write a synthesis tying inbound notes and session data to "what we read, what we changed because of it." Published quarterly.
Where this line actually stands today.
- ✓Theory documents. human-ai-symbiosis-self-evolution, self-evolution-metrics-v0, human-ai-coevolution-model-v0, memory-philosophy-v0.
- ✓Research-line spec. This page +
docs/research-line/research-line.md. - ~Anchor atlas. Drafted in this wave; full living atlas page pending.
- ~L1 sub-line (Student Sandbox). Scaffold + public web + recruitment templates complete. N = 0 real trials.
- ~L2 sub-line (trading-agent). Active in production; no co-evolution lens applied to existing data yet.
- ○L3 sub-line (personal knowledge). Methodology not pinned.
- ✓L1 capture cron. Script, workflow, and offline tests implemented; first scheduled PR pending.
- ✓Pre-registration template. Implemented at
docs/research-line/preregistration/_template.md. - ✓Review Packet Index. Implemented at
docs/research-line/session-review-index.md; latest real review is none. - ✓Research-to-Product Gate. Implemented at
docs/research-line/research-to-product-gate.md; first use expected after N=1. - ✓Research Pipeline cockpit. Implemented at
demo/research-pipeline.html; turns the North Star into preregistration → session → review → ledger → gate. - ○First quarterly synthesis. Pending real session evidence.
The single highest-leverage action remains: run one real Student Sandbox session and produce N = 1.
The deliberate negations.
This research line is NOT —
- a tutoring product or an EdTech offering
- a benchmark suite
- a substitute for peer review
- an excuse to write theory papers without running experiments
- infrastructure work disguised as research
If we find ourselves making more boundary types or more metrics rather than more sessions, we have drifted.