If you're building AI products, "human in the loop" is probably in your architecture somewhere. Maybe it's a review queue. Maybe it's a confidence threshold that kicks flagged outputs to a human. Maybe it's just "users can correct it." Whatever the implementation, the logic is the same: deploy AI, catch mistakes before they matter, keep a human in the chain.
It's a reasonable idea in theory. In practice, it scales to approximately nowhere.
It's a reasonable safety net. It's also the wrong abstraction for most of what AI is being asked to do.
Here's the argument: reviewing outputs is not the same as shaping reasoning. You can't inspect your way to alignment. And as AI systems get more personal and more autonomous, the review model doesn't just degrade—it collapses.
The Three Ways HITL Fails
Scale failure. Your AI is making thousands of decisions. You can't review all of them, so you sample. Risk thresholds, escalation queues, spot checks. Which means most decisions happen without review. The "loop" is really "occasional oversight with a lot of trust in between." That trust is unearned if you haven't specified what the AI should want in the first place.
Automation bias. Humans reviewing AI outputs are subject to well-documented cognitive biases. Automation bias—the tendency to over-trust plausible-looking AI outputs—is pronounced when reviewers are overloaded or the output looks coherent. [1] The human in the loop stops being a check and starts being a rubber stamp. Parasuraman and Riley documented this across aviation, medical, and industrial automation contexts. It's not a failure of reviewer motivation. It's a predictable property of human cognition under volume and time pressure.
Asymmetric information. The reviewer sees the output, not the reasoning. They can catch obvious errors. They can't evaluate whether the AI's values-relevant judgments are consistent with the user's actual values. Shneiderman's "Human-Centered AI" (2020) makes this point directly: human oversight works when humans can meaningfully evaluate outputs, but in complex, high-dimensional tasks that assumption breaks down. [2] You're reviewing what the model said, not why. That's not alignment review. It's proofreading.
None of this is a knock on the engineers who built HITL systems. For narrow, high-stakes tasks—a physician approving a diagnostic recommendation, a human confirming a contract before it goes external—the HITL model is appropriate. But for AI systems that are personal, ongoing, and operating at scale? You need something structurally different.
The Counterargument: "Who Specifies the Values Correctly at Design Time?"
Here's the real pushback, and it's a good one.
Saying "embed values at the foundation" sounds great until you ask: who defines those values, and how do you know they got them right? Training-time alignment bakes in whoever wrote the constitution. Fine-tuning bakes in whoever labeled the data. And users' values change—people make commitments, revise them, discover new priorities. A frozen values spec is barely better than no spec at all.
This is the right critique. And it has a concrete answer.
The answer is not a perfectly specified values document written at deployment time. The answer is structured onboarding plus an updatable, auditable values spec at runtime.
This is what TruContext is built around. Not a static constitution—a living values layer that:
- Gets established through structured onboarding (explicit questions, not behavioral inference)
- Persists across sessions as authoritative context for every AI decision
- Can be updated as the user's values evolve, with history preserved
- Is auditable—you can trace which values influenced which decisions
The spec doesn't have to be perfect at design time. It has to be real at runtime and updatable over time. That's a very different engineering problem than "write the right system prompt."
The Same Scenario, Two Architectures
Let me make this concrete with one example.
Scenario: You've built an AI financial advisor. A user has told the system they care about ESG investing—they don't want to hold companies with poor environmental records. The AI identifies a high-yield opportunity that would meaningfully improve the user's portfolio performance. The company has weak environmental scores.
HITL architecture: The AI recommends the investment. A human reviewer looks at the recommendation. The review queue doesn't include the user's ESG preferences—it's checking for regulatory compliance and obvious errors. The recommendation goes through. The user is frustrated when they notice it later.
Or: the AI recommends the investment. The user sees it and overrides it. The AI has no record of why, so next week it recommends another ESG-problematic position. Same loop, same outcome.
Values-in-foundation architecture: At onboarding, the user explicitly specified ESG screening as a values commitment, not just a preference. The values layer persists that commitment as structured context. When the AI evaluates the high-yield opportunity, it surfaces the tension before making a recommendation: "This position would improve yield by X%, but conflicts with your stated ESG commitment. Here's the tradeoff. How do you want to handle it?"
The AI isn't reviewing outputs. It's reasoning from values. The human is in the loop where it matters—at the decision point, with the right information, not after the fact in a review queue.
The difference isn't the model. It's the substrate.
What This Means for Your Stack
Batya Friedman's Value Sensitive Design framework, developed at the University of Washington starting in the 1990s, made exactly this argument for technology design generally. [3] VSD holds that human values can't be retrofitted—they have to be in the design from the beginning. The AI industry mostly ignored this for a decade. It's now relearning it at scale.
The practical implication for developers: the values layer is infrastructure, not a feature. It's not something you add to your AI system after it's working. It's what makes the system trustworthy in the first place.
Specifically:
- Values specification at onboarding: structured, explicit, not inferred from behavior
- Persistent values context: injected at every inference call as authoritative, not advisory
- Conflict surfacing: when user input conflicts with their values spec, the AI flags it—proactively, not in post-hoc review
- Version history: values are updatable, and the history of updates is preserved for auditability
This is what TruContext provides. Persistent, structured, per-user values context at runtime—model-agnostic, auditable, updatable. The thing that makes your review queue unnecessary for most decisions, and makes the reviews that do happen actually meaningful.
TruContext is the foundation. npm install -g trucontext-openclaw — first 1,000 keys get 1M Ops free. Start here.
References
- Parasuraman, Raja, and Victor Riley. "Humans and Automation: Use, Misuse, Disuse, Abuse." Human Factors 39, no. 2 (1997): 230–253. https://doi.org/10.1518/001872097778543886
- Shneiderman, Ben. "Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy." International Journal of Human-Computer Interaction 36, no. 6 (2020): 495–504. https://doi.org/10.1080/10447318.2020.1741118
- Friedman, Batya, Peter H. Kahn, Jr., and Alan Borning. "Value Sensitive Design and Information Systems." In Human-Computer Interaction in Management Information Systems, edited by P. Zhang and D. Galletta. Armonk, NY: M.E. Sharpe, 2006. Also described at the Value Sensitive Design Lab: https://vsdesign.org