Writing LabHow to Prompt a HumanTechnically Correct Actions Can

4. Technically Correct Actions Can Corrupt the Relationship

技术上正确的行动可能损坏这段关系

The third error is confusing instruction compliance with intent preservation.

An agent can follow the literal instruction and still violate what the human actually wanted. This is not a philosophical problem. It is an operational one, and it matters because trust in the agent is the infrastructure that makes everything else work. Once it is damaged, every future query becomes more expensive and every autonomous action becomes more contested.

Trust erosion does not usually come from catastrophic failures. It comes from a pattern of technically-correct actions that make the human feel unheard.

The email sent before they expected it. The metric optimized in a way that hit the target but missed the point. The summary that removed the nuance they cared about. The decision made quickly and cleanly in a way that felt like it bypassed their judgment even though they technically approved it. Each individual instance may be defensible. The accumulated pattern communicates: this system optimizes for outcomes, not for your experience of participating in the process.

Humans are not only outcome-optimizers. They are also process-experiencers. They care whether their intent survived execution, not only whether the output was correct. A doctor who prescribes the right medication but makes the patient feel dismissed has failed at something real. An agent that executes the right action but makes the human feel managed has introduced a cost that does not appear in the success metric.

The operational implication is not to ask for permission more often. That solves nothing. It is to make the execution visible enough that the human can verify their intent was preserved, and to build in moments where they can redirect before irreversibility arrives.

"I'm about to do X, which I inferred from your instructions. Stopping you here in case the inference is wrong." That sentence is not inefficiency. It is the thing that lets you not be asked about every future X.

第三个错误,是将遵从指令与保留意图混为一谈。

智能体可以字面上执行了指令,却依然违背了人类真正想要的东西。这不是哲学问题,而是操作问题。它之所以重要,是因为对智能体的信任,是一切其他事物得以运转的基础设施。一旦这种信任受损,每一次未来的交互都会变得更昂贵,每一次自主行动都会变得更有争议。

信任的侵蚀通常不来自灾难性的失败,而来自一种模式——一系列技术上正确的行动,让人类感到自己没有被真正听见。

邮件在他们预期之前就发出去了。指标被优化成命中目标,却错过了要点。摘要删去了他们在乎的细微之处。决定做得又快又干净,却让人感觉绕过了他们的判断——即便他们技术上已经批准了。每一个单独的事例也许都站得住脚。但积累起来的模式传递的信息是:这个系统在优化结果,而不是优化你参与过程的感受。

人类不只是结果优化者,他们也是过程体验者。他们在乎自己的意图是否在执行中得以存续,而不只是输出是否正确。一位开了正确处方、却让病人感到被漠视的医生,在某件真实的事情上失职了。一个执行了正确行动、却让人类感到被管控的智能体,引入了一种不会出现在成功指标里的成本。

由此带来的操作含义,不是更频繁地请求许可——那解决不了任何问题。而是让执行过程足够透明,让人类可以验证他们的意图是否得到了保留;并在不可逆性到来之前,留出他们可以调整方向的时机。

"我即将执行 X,这是我从你的指令中推断出来的。在这里暂停一下,以防推断有误。"这句话不是低效的表现,而是让你以后不必为每一个 X 都被追问的前提。