LLMs之Survey:《The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook》翻译与解读
LLMs之Survey《The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook》翻译与解读导读这篇综述把 latent space 重新定义为语言模型乃至多模态智能系统的机器原生计算底座系统梳理了它的概念边界、发展脉络、实现机制、能力边界和未来挑战论文的核心观点是未来智能系统很可能越来越依赖 latent而非纯文本的内部计算但要真正走向通用底座还必须同步解决评估、控制、解释和理论化四大问题。 背景痛点● 概念与边界不清现有关于 latent space 的研究分散在不同任务、模态和机制里大家常把它等同于“latent reasoning”但这篇综述指出latent space 在语言模型中其实是更广义的连续内部计算空间和显式 token space、以及视觉生成模型中的 latent space 也不能混为一谈。● 传统 token-centric视角已不够用论文认为现代 LLM/VLM/VLA/agent 系统虽然仍以显式 token 生成来理解但很多关键内部过程其实更适合在连续 latent space 中完成显式空间存在冗余、离散化瓶颈、顺序解码开销和语义损失等结构性限制。● 文献碎片化严重已有工作在对象、机制和场景上高度分裂有的聚焦推理有的聚焦视觉理解、记忆、行动或多智能体协作缺少一个统一框架把这些方法放在同一张地图里比较。●评价、控制与解释都不成熟latent trajectory 不像显式 CoT 那样可直接读出导致可评估性、可控性和可解释性偏弱很多方法只能看最终答案难以验证中间过程是否真的正确、完整、相关。 具体的解决方案● 提出统一综述框架论文没有把 latent space 仅仅当成单一技术点而是给出一个五段式叙事结构Foundation、Evolution、Mechanism、Ability、Outlook用来系统回答“它是什么、如何发展、如何工作、能做什么、未来是什么”。● 建立二维分类法作者用两个正交轴来组织研究——MechanismArchitecture、Representation、Computation、Optimization和 AbilityReasoning、Planning、Modeling、Perception、Memory、Collaboration、Embodiment——把原本分散的方法统一到同一分类体系中。● 从“显式语言”到“内部工作区”重新定义latent space论文把 latent space 定位为机器原生计算底座强调它不仅是压缩 CoT 的工具更可能成为模型内部进行思考、模拟、记忆和协作的主工作区。● 把现有方法按机制拆解归类在机制层面作者将现有工作分成架构型、表征型、计算型和优化型四大类并进一步细化出 backbone/component/auxiliary model、internal/external/learnable/hybrid、compressed/expanded/adaptive/interleaved 等子类。 核心思路步骤● 第一步界定概念与比较对象先区分 latent space 与 explicit/verbal space再区分它与视觉生成模型中的 latent space避免把不同范畴混在一起。● 第二步梳理发展脉络论文把演进过程分为 Prototype、Formation、Expansion、Outbreak 四个阶段展示 latent space 如何从早期“把 CoT 压缩进连续状态”的探索走向更广泛的系统级范式。● 第三步按机制解析实现方式作者分别从 architecture、representation、computation、optimization 四个层面分析 latent space 如何嵌入模型并说明不同方案如何在效率、表达力、可控性之间折中。● 第四步按能力总结外显收益论文系统讨论 latent space 在 reasoning、planning、modeling、perception、memory、collaboration、embodiment 七类能力上的作用说明它不只服务于“更会推理”而是扩展了模型的整体智能边界。● 第五步上升到未来议题最后把问题收束到理论、评估、控制、可解释性和多模态统一工作区强调研究不能只追求更高准确率而要建立可验证、可治理的 latent computation 方法学。 优势● 统一性强这篇综述最大的优势是把 latent space 相关工作从“碎片化论文集合”整理成“可比较、可检索、可演化”的知识框架便于后续研究定位自己的工作。● 视野更大它把 latent space 从单纯的 latent reasoning 扩展到语言、视觉、记忆、协作和具身智能凸显了 latent computation 作为通用底座的潜力。● 机制层解释更细通过对 architecture / representation / computation / optimization 的拆分论文让不同方法的差异不再只是“名字不同”而是能落到结构与训练逻辑上理解。● 对未来研究有明确指引论文不仅回顾已有成果还明确指出了最缺的东西——标准化评估、可控 latent 接口、过程级监督、可解释框架和可信赖基准。 论文结论与观点侧重经验与建议● latent space 正在从“技巧”变成“原则”作者认为领域趋势是从 heuristic usage 走向 systematic principle从外部插入走向内部原生从静态固定走向动态自适应。● 未来模型更可能以 latent 作为内部工作区显式语言会继续承担指令、生成和验证的外部接口角色但真正的思考、模拟、记忆和规划会越来越多地在 latent space 内完成。● 能力扩展来自“难以言说的信息”reasoning、planning、memory、collaboration、embodiment 等能力的共性是它们都依赖那些难以、昂贵地或有损地外化成自然语言的结构。● 未来研究要先补理论再谈堆性能论文明确提出领域需要关于 latent computation 为何有效、何时优于显式推理、以及怎样组织 latent geometry 和 optimization dynamics 的基础理论。● 建立标准化评估与监督协议目前 latent reasoning 的评估仍然碎片化缺少统一 benchmark、faithfulness/robustness/internal consistency 指标因此公平比较和累积进展都受阻。● 把可控性和可解释性做成系统能力未来不应只做局部 steering而应建立可把明确目标、安全约束和资源预算映射到内部计算过程的机制并给出可解释框架定位语义结构、因果路径和失败来源。● 多模态的重点不是“更多模态”而是“统一 latent 工作台”作者认为真正重要的是让语言、视觉、动作、记忆和多智能体通信共享一个连续 latent substrate而不是继续堆叠 token 化的跨模态接口。目录《The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook》翻译与解读AbstractFigure 1:Overview of the latent space methods classified by two axes: four main Mechanisms (Section 4) and seven key Abilities (Section 5). Within our classification system, a single method may be affiliated with one or more mechanisms and capabilities. For the visualization in this figure, we adopt the most appropriate classification for each method; a comprehensive elaboration of these categories will be presented in the main text.图 1通过两个维度对潜在空间方法进行分类的概述四种主要机制第 4 节和七种关键能力第 5 节。在我们的分类系统中单个方法可能与一种或多种机制和能力相关联。在本图的可视化中我们为每种方法采用了最合适的分类这些类别的全面阐述将在正文部分呈现。Figure 2:Outline of the survey, including five sections and sequential questions: Foundation: What is Latent Space? (Section 2), Evolution: How Did Latent Space Develop? (Section 3), Mechanism: How Does Latent Space Work? (Section 4), Ability: What Does Latent Space Enable? (Section 5), and Outlook: What is Next? (Section 6)图 2本综述的概要包括五个部分和依次提出的问题基础什么是潜在空间第 2 节演进潜在空间是如何发展的第 3 节机制潜在空间是如何运作的第 4 节能力潜在空间能实现什么第 5 节展望未来会怎样第 6 节1、Introduction6 Conclusion《The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook》翻译与解读地址论文地址https://arxiv.org/abs/2604.02029时间2026年04月02日作者新加坡国立大学、复旦大学、清华大学、浙江大学、上海人工智能实验室、中国人民大学、香港中文大学、香港科技大学、DeepWisdom、南京大学、上海交通大学、南洋理工大学、腾讯混元、QuantaAlpha、北京邮电大学、之江实验室、中国科学院大学、香港大学AbstractLLatent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-readable verbal traces. This shift is driven by the structural limitations of explicit-space computation, including linguistic redundancy, discretization bottlenecks, sequential inefficiency, and semantic loss. As a result, research on latent space has expanded from early latent reasoning into a broader landscape spanning planning, modeling, perception, memory, collaboration, and embodiment. However, the literature remains fragmented across mechanisms, modalities, and tasks, lacking a unified perspective on how latent space is defined, classified, and studied.This survey aims to provide a unified and up-to-date landscape of latent space in language-based models. We organize the survey into five sequential perspectives: Foundation, Evolution, Mechanism, Ability, and Outlook. We begin by delineating the scope of latent space, distinguishing it from explicit or verbal space and from the latent spaces commonly studied in generative visual models. We then trace the field’s evolution from early exploratory efforts to the current large-scale expansion. To organize the technical landscape, we examine existing work through the complementary lenses of mechanism and ability. From the perspective of Mechanism, we identify four major lines of development: Architecture, Representation, Computation, and Optimization. From the perspective of Ability, we show how latent space supports a broad capability spectrum spanning Reasoning, Planning, Modeling, Perception, Memory, Collaboration, and Embodiment. Beyond consolidation, we discuss the key open challenges, and outline promising directions for future research. We hope this survey serves not only as a reference for existing work, but also as a foundation for understanding latent space as a general computational and systems paradigm for next-generation intelligence.潜在空间(LLatent space)正迅速成为基于语言模型的天然基础。尽管现代系统通常仍通过明确的标记级生成来理解但越来越多的研究表明许多关键的内部过程在连续的潜在空间中比在人类可读的言语痕迹中更自然地进行。这种转变是由显式空间计算的结构限制所驱动的包括语言冗余、离散化瓶颈、顺序低效和语义损失。因此关于潜在空间的研究已从早期的潜在推理扩展到涵盖规划、建模、感知、记忆、协作和具身化等更广泛的领域。然而相关文献在机制、模态和任务方面仍较为分散缺乏对潜在空间如何定义、分类和研究的统一视角。本综述旨在提供基于语言模型的潜在空间的统一且最新的全景图。我们将综述组织为五个连续的视角基础、演进、机制、能力与展望。我们首先明确潜在空间的范围将其与显式或语言空间以及生成式视觉模型中常见的潜在空间区分开来。然后我们追溯该领域从早期探索性努力到当前大规模扩张的发展历程。为了梳理技术格局我们从机制和能力这两个互补的角度审视现有工作。从机制的角度来看我们确定了四个主要的发展方向架构、表示、计算和优化。从能力的角度来看我们展示了潜在空间如何支持从推理、规划、建模、感知、记忆、协作到具身化等广泛的能力谱。除了总结之外我们还讨论了关键的开放性挑战并概述了未来研究的有前景的方向。我们希望这篇综述不仅能作为现有工作的参考还能作为理解潜在空间作为下一代智能的通用计算和系统范式的基础。Figure 1:Overview of the latent space methods classified by two axes: four main Mechanisms (Section 4) and seven key Abilities (Section 5). Within our classification system, a single method may be affiliated with one or more mechanisms and capabilities. For the visualization in this figure, we adopt the most appropriate classification for each method; a comprehensive elaboration of these categories will be presented in the main text.图 1通过两个维度对潜在空间方法进行分类的概述四种主要机制第 4 节和七种关键能力第 5 节。在我们的分类系统中单个方法可能与一种或多种机制和能力相关联。在本图的可视化中我们为每种方法采用了最合适的分类这些类别的全面阐述将在正文部分呈现。Figure 2:Outline of the survey, including five sections and sequential questions: Foundation: What is Latent Space? (Section 2), Evolution: How Did Latent Space Develop? (Section 3), Mechanism: How Does Latent Space Work? (Section 4), Ability: What Does Latent Space Enable? (Section 5), and Outlook: What is Next? (Section 6)图 2本综述的概要包括五个部分和依次提出的问题基础什么是潜在空间第 2 节演进潜在空间是如何发展的第 3 节机制潜在空间是如何运作的第 4 节能力潜在空间能实现什么第 5 节展望未来会怎样第 6 节1、IntroductionRecent advances in language-based models, including Large Language Models (LLMs), Vision-Language Models (VLMs), Vision-Language-Action models (VLAs), and agentic systems built on language backbones, are still commonly understood through explicit token-level generation, where inputs, outputs, and even intermediate reasoning are expressed in human-readable form [vaswani2017attention, wei2022chain, yao2023react]. Yet this token-centric framing is increasingly insufficient [hao2024training, perception2025bigverdi, jihoon2025llm]. Because computation in such models fundamentally unfolds through continuous activations, latent space is increasingly being reconceived not as a hidden implementation detail, but as a machine-native substrate, such as reasoning [hao2024training, zhu2025soft, xu2025softcot], perception [perception2025bigverdi, ahmed2025alignvlm], memory [zhang2025memgen, yu2025vismem], communication [zheng2025thought, zou2025latent], and action [huang2025thinkact, ni2025swiftvla]. This shift is driven in part by the structural limitations of explicit space, its redundancy, discretization bottleneck, sequential decoding cost, and potential loss of fine-grained information, especially in complex, multimodal, or long-horizon settings. By contrast, latent-space computation offers a more continuous, compact, and expressive medium that can support higher-fidelity representations and more flexible allocation of computation.近期基于语言的模型包括大型语言模型LLMs、视觉语言模型VLMs、视觉语言动作模型VLAs以及基于语言骨干的代理系统的进展通常仍通过明确的标记级生成来理解其中输入、输出甚至中间推理都以人类可读的形式表达[vaswani2017attention, wei2022chain, yao2023react]。然而这种以标记为中心的框架越来越不足以应对当前的需求 [hao2024training, perception2025bigverdi, jihoon2025llm]。因为此类模型中的计算本质上是通过连续激活展开的所以潜在空间正越来越多地被重新构想为机器原生的基质例如推理 [hao2024training, zhu2025soft, xu2025softcot]、感知 [perception2025bigverdi, ahmed2025alignvlm]、记忆 [zhang2025memgen, yu2025vismem]、通信 [zheng2025thought, zou2025latent] 以及行动 [huang2025thinkact, ni2025swiftvla]。这种转变在一定程度上是由显式空间的结构性限制所驱动的包括其冗余性、离散化瓶颈、顺序解码成本以及在复杂、多模态或长时序场景中可能丢失的细粒度信息。相比之下潜在空间计算提供了一种更连续、更紧凑且更具表现力的媒介能够支持更高保真度的表示以及更灵活的计算分配。Research has therefore moved far beyond the initial framing of latent space as latent reasoning alone.What began as an attempt to internalize chain-of-thought into continuous states has rapidly expanded into a broader systems paradigm spanning new modalities, new interaction settings, and new design choices [reasoning2025chen, survey2025zhu, implicit2025li]. However, this growth has also fragmented the literature in at least three ways: by application object, e.g., latent reasoning, visual understanding, and embodied action; by mechanism, e.g., architecture design, representation choice, computation pattern, and optimization strategy; and by scenario, spanning text, vision, multi-agent systems, and embodied environments. Existing reviews mainly focus on latent reasoning or implicit reasoning as a reasoning-specific phenomenon. What remains missing is a unified perspective that treats latent space as a broader computational and systems paradigm across modalities, paradigms, mechanisms, and capabilities.To address this gap, we organize the survey around five sequential questions that move from conceptual grounding to future outlook, as illustrated in Figure 2: What is latent space? How did it develop? How does it work? What does it enable? What is next? These questions define the macro-level narrative of the paper: Foundation (Section 2) delineates the concept of latent space and clarifies its relation to explicit space and to latent space in generative visual models; Evolution (Section 3) traces how the field progressed from prototype exploration to rapid outbreak; Mechanism (Section 4) explains how latent space is instantiated and operationalized; Ability (Section 5) examines what latent computation enables across downstream capabilities; and Outlook (Section 6) synthesizes open challenges and future directions. This five-question narrative is intentionally sequential. This organization allows us to preserve a clear narrative while also comparing diverse methods through shared principles and capability outcomes, rather than through task-specific labels alone.Within this sequential narrative, our technical synthesis is anchored by a two-dimensional taxonomy shown in Figure 1. The first axis, Mechanism, asks how latent space is built and used, and covers four major lines: Architecture, Representation, Computation, and Optimization. The second axis, Ability, asks what latent space enables, and covers seven major capability domains: Reasoning, Planning, Modeling, Perception, Memory, Collaboration, and Embodiment. This design lets us preserve a clear survey-level storyline while also comparing diverse methods through shared design principles and shared capability outcomes, rather than through task-specific labels alone.因此研究已经远远超出了最初将潜在空间仅仅视为潜在推理的框架。最初旨在将链式思维内化为连续状态的努力现已迅速扩展为一个更广泛的系统范式涵盖新的模态、新的交互设置以及新的设计选择[reasoning2025chen, survey2025zhu, implicit2025li]。然而这种发展也至少在三个方面使相关文献变得支离破碎按应用对象划分例如潜在推理、视觉理解和具身行动按机制划分例如架构设计、表示选择、计算模式和优化策略按场景划分涵盖文本、视觉、多智能体系统和具身环境。现有的综述主要关注潜在推理或隐式推理作为推理特有的现象。目前所缺失的是一个统一的视角将潜在空间视为跨模态、范式、机制和能力的更广泛的计算和系统范式。为弥补这一空白我们围绕五个依次递进的问题来组织此次调查这些问题从概念基础延伸至未来展望如图 2 所示什么是潜在空间它是如何发展的它是如何运作的它能实现什么接下来会怎样这些问题界定了本文的宏观叙事基础第 2 节阐述了潜在空间的概念并阐明了其与显式空间以及生成式视觉模型中的潜在空间的关系演进第 3 节追溯了该领域从原型探索到迅速爆发的发展历程机制第 4 节解释了潜在空间是如何实现和运作的能力第 5 节探讨了潜在计算在下游能力方面所实现的功能展望第 6 节综合了开放性挑战和未来方向。这种五问式的叙事是有意按顺序排列的。这种组织方式使我们既能保持清晰的叙述又能通过共同的原则和能力成果来比较各种方法而不仅仅是通过特定任务的标签。在这一依次递进的叙述中我们的技术综合以图 1 所示的二维分类法为基础。第一个维度“机制”探讨潜在空间是如何构建和使用的涵盖四个主要方面架构、表示、计算和优化。第二个维度“能力”探讨潜在空间能够实现什么涵盖七个主要能力领域推理、规划、建模、感知、记忆、协作和具身化。这种设计使我们能够在保持清晰的综述级叙事的同时通过共享的设计原则和共享的能力成果来比较各种方法而不仅仅是通过特定任务的标签。Contributions• We clarify the conceptual scope of latent space in language-based models, distinguishing it from explicit or verbal space and from the latent spaces commonly studied in generative visual models.• We provide a unified review of how latent space has evolved from early latent reasoning into a broader multimodal and systems-level research paradigm.• We introduce a two-dimensional taxonomy across Mechanism and Ability, offering a common framework for organizing otherwise fragmented methods and applications.• We provide a comprehensive collection of resources, including illustrative figures, structured tables, accessible links, and repositories, to facilitate further research and community engagement.贡献• 我们明确了基于语言模型中潜在空间的概念范围将其与显式或语言空间以及在生成式视觉模型中常见的潜在空间区分开来。• 我们统一回顾了潜在空间如何从早期的潜在推理演变为更广泛的多模态和系统级研究范式。• 我们引入了一个跨越机制和能力的二维分类法为原本分散的方法和应用提供了一个通用框架。我们提供全面的资源集合包括说明性图表、结构化表格、可访问的链接和存储库以促进进一步的研究和社区参与。6 ConclusionIn this survey, we have presented a systematic review of latent space in language-based models from five complementary perspectives: foundation, evolution, mechanism, ability, and outlook. Taken together, these perspectives suggest that latent space should be a substrate that may fundamentally reshape how intelligent language models deal with diverse information. We further show that the development of this field has rapidly progressed from early explorations of latent reasoning to a broader and increasingly unified research paradigm spanning language, vision, memory, collaboration, and embodied action.To systematically organize this promising landscape, we propose a taxonomy along two orthogonal axes: mechanism orientation and ability orientation. On the axis of Mechanism, we classify four key types: architecture, representation, computation, and optimization, which defines how latent space is operationalized. On the axis of Ability, we expand the single type in previous surveys to seven main functional categories: reasoning, planning, modeling, perception, memory, collaboration, and embodiment. Across these dimensions, a consistent trend becomes visible: Latent space brings a fundamental transformation to model mechanisms while pushing the boundaries of model capabilities.At the same time, the promise of latent space must be considered together with its unresolved challenges. As increasingly more cognition is internalized into continuous hidden computation, the resulting processes become harder to evaluate, control, and interpret. Future progress will therefore depend not only on improving empirical performance, but also on establishing stronger theoretical foundations, more reliable benchmarks and supervision protocols, and more transparent as well as controllable latent mechanisms. Overall, the central conclusion of this survey is that latent space holds the potential to become a foundational principle for language-based models. We hope that this survey offers a coherent foundation for future research and serves as a valuable reference for future researchers.在本次综述中我们从五个互补的视角对基于语言模型的潜在空间进行了系统性回顾基础、演进、机制、能力以及展望。综合来看这些视角表明潜在空间应当成为一种可能从根本上重塑智能语言模型处理多样化信息方式的基质。我们进一步表明该领域的发展已从早期对潜在推理的探索迅速发展为一个更广泛且日益统一的研究范式涵盖语言、视觉、记忆、协作以及具身行动。为了系统地梳理这一充满前景的领域我们提出了一个沿两个正交轴的分类法机制导向和能力导向。在机制轴上我们划分了四种关键类型架构、表示、计算和优化这定义了潜在空间如何被操作化。在能力轴上我们将先前综述中的单一类型扩展为七个主要功能类别推理、规划、建模、感知、记忆、协作和具身。在这些方面一个明显的趋势逐渐显现潜在空间为模型机制带来了根本性的变革同时拓展了模型的能力边界。与此同时潜在空间的前景必须与其尚未解决的挑战一同考量。随着越来越多的认知被内化为连续的隐藏计算由此产生的过程变得愈发难以评估、控制和解释。因此未来的进步不仅取决于提升经验性能还取决于建立更坚实的理论基础、更可靠的基准和监督协议以及更透明且可控的潜在机制。总的来说本次综述的核心结论是潜在空间有可能成为基于语言的模型的基础原则。我们希望本次综述能为未来的研究提供一个连贯的基础并成为未来研究人员的宝贵参考。