NOUS OS Research Line · HAICES · v0

Toward compounding wisdom in human-AI pairs.

Most AI research measures how good the model became. Very few measure how capable, calibrated, or wiser the human became — especially in the moments when AI is no longer in the room. This research line is built around that gap.

Research overview is the public map. Research Line is the operating evidence system.

Reader path · operator path

Start with Research when you want the plain-language overview: theory, model, metrics, Student Sandbox readiness, and source notes. Use this Research Line when you need the research protocol: preregistration, session review packets, evidence ledger, method commitments, and the gate that turns findings into product changes.

The long-horizon question this line exists to answer.

North Star · decade-horizon · philosophical
Under what conditions does a human-AI pair accumulate compounding wisdom — not merely accelerated output, not merely personalized convenience — over months and years?
This question is intentionally not testable in a single session. The two near-term instruments below feed it. If those instruments trend upward across many cycles across many people, that is evidence for the north star. If they do not, the north star is wishful.

What we can actually measure on the way there.

The north star is not directly testable. These two questions are.

(a) capability-without-AI delta

Does the human perform measurably better when AI is absent?

After a scaffolded AI-assisted loop, the human is given a similar task without AI access. We compare to a no-AI baseline and a cold-chat-AI baseline. The reverse of the cognitive-offloading concern.

surface · Student Sandboxhorizon · per-session, weeks
(b) calibrated trust + responsibility retention

Does trust track evidence quality rather than fluency?

Over repeated cycles, does the human delegate the right parts and retain the right parts; does over-reliance shrink; does the human get better at saying "the AI is wrong here"?

surface · trading-agent / cohorthorizon · per-cycle, months

Three proof beds, three boundary conditions.

Each sub-line has a clear study unit, a baseline condition, and the boundaries it primarily tests. We move sub-lines forward when evidence accumulates, not when arguments accumulate.

L1 · Learning loop

Student Sandbox

Does a 20-minute scaffolded human-AI loop produce a measurable capability-without-AI delta vs (i) no-AI and (ii) cold-chat baselines?

Unit
one 20-min session + delayed transfer task without AI
Baselines
no-AI · cold-chat AI
Boundaries
learning, fact, privacy, taste
N = 0 real trials
L2 · Decision loop

Trading-agent

Does AI-augmented decision-making with explicit boundaries produce decisions the human can defend better, calibrate trust better, and learn from outcomes better over time?

Unit
one promoted candidate + post-outcome review
Baseline
retrospective bin by boundary discipline
Boundaries
decision, responsibility, value
running · co-evolution lens not yet applied
L3 · Knowledge loop

Personal knowledge

Does long-horizon human-AI memory interaction (Obsidian + TrustMem) produce compounding reflection capacity, or does it produce stale personalization and dependence?

Unit
90-day Obsidian section + retrospective task
Baseline
pre-NOUS journaling segment, if available
Boundaries
identity, taste, responsibility
methodology not pinned

The rules that turn a theory document into a research line.

What surrounds this work, and what is left empty.

The full anchor atlas (academic + industry + products + voices) — ~30 anchors across 6 buckets, each with our positioning — lives at research-line-atlas (also at docs/research-line/anchor-atlas.md). Summary positioning:

TraditionClosest anchorWhere NOUS OS adds
AugmentationEngelbart 1962 · Bush 1945LLM-era boundary taxonomy + capability-delta instrument
Cognitive offloadingRisko & Gilbert 2016 · Storm & StoneReverse question: when does offloading make people stronger?
Self-regulated learningZimmerman · Vygotsky ZPDAI-native instantiation of forethought → performance → reflection
AI literacyLong & Magerko 2020 · UNESCO 2024From descriptive taxonomy to instrumented loop
Tools for thoughtMatuschak & Nielsen 2019 · Bret VictorFrom individual cognition tools to explicit symbiosis
Practical AI adviceMollick · Co-Intelligence 2024From descriptive heuristics for adults to prescriptive measured protocol
Industry productsKhanmigo · NotebookLM · Cursor · Claude CodeProduct-agnostic protocol layered on any AI surface

The position the literature is most empty at: measuring whether a person is more capable when AI is absent. That is where this line is most concentrated.

How fresh thinking enters the system every day.

The north star cannot be served by internal work alone. NOUS OS must continuously absorb external thinking — academic preprints, industry research, top products, individual essays and podcasts — without drowning in firehose noise. Three-tier discipline:

L1 · Capture

Daily

Scheduled remote agent pulls ~5–8 narrow high-signal sources, filters by anchor keywords, writes raw daily inbox to docs/research-line/inbound/_inbox/.

status · cron not yet live
L2 · Triage

Weekly

Scheduled agent (or human) promotes 1–3 candidates per week to full 1-page inbound notes. Each note has an HTML mirror and joins the public corpus.

status · manual until L1 is live
L3 · Synthesis

Quarterly

Human + Claude write a synthesis tying inbound notes and session data to "what we read, what we changed because of it." Published quarterly.

status · template pending

Where this line actually stands today.

N = 0real Student Sandbox sessions
1AI-simulated dry-run audit, not counted as real evidence
nonelatest real review packet
N=1next planned trial: first student-adjacent session
  • Theory documents. human-ai-symbiosis-self-evolution, self-evolution-metrics-v0, human-ai-coevolution-model-v0, memory-philosophy-v0.
  • Research-line spec. This page + docs/research-line/research-line.md.
  • ~Anchor atlas. Drafted in this wave; full living atlas page pending.
  • ~L1 sub-line (Student Sandbox). Scaffold + public web + recruitment templates complete. N = 0 real trials.
  • ~L2 sub-line (trading-agent). Active in production; no co-evolution lens applied to existing data yet.
  • L3 sub-line (personal knowledge). Methodology not pinned.
  • L1 capture cron. Script, workflow, and offline tests implemented; first scheduled PR pending.
  • Pre-registration template. Implemented at docs/research-line/preregistration/_template.md.
  • Review Packet Index. Implemented at docs/research-line/session-review-index.md; latest real review is none.
  • Research-to-Product Gate. Implemented at docs/research-line/research-to-product-gate.md; first use expected after N=1.
  • Research Pipeline cockpit. Implemented at demo/research-pipeline.html; turns the North Star into preregistration → session → review → ledger → gate.
  • First quarterly synthesis. Pending real session evidence.

The single highest-leverage action remains: run one real Student Sandbox session and produce N = 1.

The deliberate negations.

This research line is NOT —

  • a tutoring product or an EdTech offering
  • a benchmark suite
  • a substitute for peer review
  • an excuse to write theory papers without running experiments
  • infrastructure work disguised as research

If we find ourselves making more boundary types or more metrics rather than more sessions, we have drifted.