config.json这份文档是一个模型配置文件通常为config.json详细定义了一个名为Qwen3_5ForConditionalGeneration的多模态大语言模型的架构参数、量化配置及文本和视觉子模块的具体设置。以下是核心信息总结{ architectures: [ Qwen3_5ForConditionalGeneration ], image_token_id: 248056, model_type: qwen3_5, quantization_config: { config_groups: { group_0: { format: pack-quantized, input_activations: null, output_activations: null, targets: [ Linear ], weights: { actorder: null, block_structure: null, dynamic: false, group_size: 32, num_bits: 4, observer: mse, observer_kwargs: {}, scale_dtype: null, strategy: group, symmetric: true, type: int, zp_dtype: null } } }, format: pack-quantized, global_compression_ratio: null, ignore: [ model.visual.blocks.0.attn.qkv, model.visual.blocks.0.attn.proj, model.visual.blocks.0.mlp.linear_fc1, model.visual.blocks.0.mlp.linear_fc2, model.visual.blocks.1.attn.qkv, model.visual.blocks.1.attn.proj, model.visual.blocks.1.mlp.linear_fc1, model.visual.blocks.1.mlp.linear_fc2, model.visual.blocks.2.attn.qkv, model.visual.blocks.2.attn.proj, ... model.visual.blocks.26.attn.qkv, model.visual.blocks.26.attn.proj, model.visual.blocks.26.mlp.linear_fc1, model.visual.blocks.26.mlp.linear_fc2, model.visual.merger.linear_fc1, model.visual.merger.linear_fc2, model.language_model.layers.0.linear_attn.in_proj_b, model.language_model.layers.0.linear_attn.in_proj_a, model.language_model.layers.1.linear_attn.in_proj_b, model.language_model.layers.1.linear_attn.in_proj_a, ... model.language_model.layers.30.linear_attn.in_proj_b, model.language_model.layers.30.linear_attn.in_proj_a, lm_head, mtp.fc ], kv_cache_scheme: null, quant_method: compressed-tensors, quantization_status: compressed, sparsity_config: {}, transform_config: {}, version: 0.14.1.dev20g5d2f568 }, text_config: { attention_bias: false, attention_dropout: 0.0, attn_output_gate: true, dtype: bfloat16, eos_token_id: 248044, full_attention_interval: 4, head_dim: 256, hidden_act: silu, hidden_size: 4096, initializer_range: 0.02, intermediate_size: 12288, layer_types: [ linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention, linear_attention, linear_attention, linear_attention, full_attention ], linear_conv_kernel_dim: 4, linear_key_head_dim: 128, linear_num_key_heads: 16, linear_num_value_heads: 32, linear_value_head_dim: 128, max_position_embeddings: 262144, mlp_only_layers: [], model_type: qwen3_5_text, mtp_num_hidden_layers: 1, mtp_use_dedicated_embeddings: false, num_attention_heads: 16, num_hidden_layers: 32, num_key_value_heads: 4, rms_norm_eps: 1e-06, use_cache: true, vocab_size: 248320, mamba_ssm_dtype: float32, rope_parameters: { mrope_interleaved: true, mrope_section: [ 11, 11, 10 ], rope_type: default, rope_theta: 10000000, partial_rotary_factor: 0.25 } }, tie_word_embeddings: false, transformers_version: 4.57.0.dev0, video_token_id: 248057, vision_config: { deepstack_visual_indexes: [], depth: 27, hidden_act: gelu_pytorch_tanh, hidden_size: 1152, in_channels: 3, initializer_range: 0.02, intermediate_size: 4304, model_type: qwen3_5, num_heads: 16, num_position_embeddings: 2304, out_hidden_size: 4096, patch_size: 16, spatial_merge_size: 2, temporal_patch_size: 2 }, vision_end_token_id: 248054, vision_start_token_id: 248053 }1. 模型架构概览模型类型qwen3_5属于Qwen3_5ForConditionalGeneration架构。模态支持多模态包含文本、图像、视频处理能力。量化状态已压缩量化quantization_status采用compressed-tensors方法量化版本为0.14.1.dev20g5d2f568。2. 量化配置 (Quantization)量化方法pack-quantized格式使用int类型4-bit 精度num_bits: 4。策略分组策略strategy: group组大小为 32。忽略列表大量视觉模块如model.visual.blocks和部分语言模型层如linear_attn相关层未被量化保持原始精度。3. 文本模型配置 (Text Config)核心参数隐藏层大小 4096中间层大小 12288层数 32注意力头数 16。混合注意力包含线性注意力Linear Attention和全注意力Full Attention层交替分布。位置编码使用 RoPE旋转位置编码theta为 10,000,000支持最大 262,144 个 token 的上下文长度。数据类型bfloat16。4. 视觉模型配置 (Vision Config)输入处理Patch 大小 16时间 Patch 大小 2输入通道 3。网络结构深度 27 层隐藏层大小 1152注意力头数 16。输出映射输出隐藏层大小映射至 4096以匹配语言模型维度。5. Token ID 映射图像 TokenID 248056视频 TokenID 248057视觉起止 Token起始 ID 248053结束 ID 248054结束 TokenID 248044词表大小248320该配置文件描述了一个高度定制化的 Qwen3.5 模型变体通过混合注意力机制优化长文本处理并结合了 4-bit 量化技术与部分非量化层以平衡推理效率与视觉理解精度。model.safetensors.index.json这份文档是一个模型权重映射文件通常为 JSON 格式主要用于定义模型参数与物理存储文件.safetensors之间的对应关系。以下是该文档的关键信息总结{ metadata: {}, weight_map: { model.language_model.layers.0.linear_attn.in_proj_qkv.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.0.linear_attn.in_proj_z.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.0.linear_attn.out_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.0.mlp.down_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.0.mlp.gate_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.0.mlp.up_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.1.linear_attn.in_proj_qkv.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.1.linear_attn.in_proj_z.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.1.linear_attn.out_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.1.mlp.down_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.1.mlp.gate_proj.weight_shape: model-00001-of-00003.safetensors, model.language_model.layers.1.mlp.up_proj.weight_shape: model-00001-of-00003.safetensors, ... model.language_model.layers.31.input_layernorm.weight: model-00002-of-00003.safetensors, model.language_model.layers.31.mlp.down_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.31.mlp.gate_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.31.mlp.up_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.31.post_attention_layernorm.weight: model-00002-of-00003.safetensors, model.language_model.layers.31.self_attn.k_norm.weight: model-00002-of-00003.safetensors, model.language_model.layers.31.self_attn.k_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.31.self_attn.o_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.31.self_attn.q_norm.weight: model-00002-of-00003.safetensors, model.language_model.layers.31.self_attn.q_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.31.self_attn.v_proj.weight_scale: model-00002-of-00003.safetensors, model.language_model.layers.4.input_layernorm.weight: model-00002-of-00003.safetensors, model.language_model.norm.weight: model-00002-of-00003.safetensors, model.visual.blocks.0.attn.proj.bias: model-00002-of-00003.safetensors, model.visual.blocks.0.attn.proj.weight: model-00002-of-00003.safetensors, model.visual.blocks.0.attn.qkv.bias: model-00002-of-00003.safetensors, model.visual.blocks.0.attn.qkv.weight: model-00002-of-00003.safetensors, model.visual.blocks.0.mlp.linear_fc1.bias: model-00002-of-00003.safetensors, model.visual.blocks.0.mlp.linear_fc1.weight: model-00002-of-00003.safetensors, model.visual.blocks.0.mlp.linear_fc2.bias: model-00002-of-00003.safetensors, model.visual.blocks.0.mlp.linear_fc2.weight: model-00002-of-00003.safetensors, model.visual.blocks.0.norm1.bias: model-00002-of-00003.safetensors, model.visual.blocks.0.norm1.weight: model-00002-of-00003.safetensors, model.visual.blocks.0.norm2.bias: model-00002-of-00003.safetensors, model.visual.blocks.0.norm2.weight: model-00002-of-00003.safetensors, model.visual.blocks.1.attn.proj.bias: model-00002-of-00003.safetensors, model.visual.blocks.1.attn.proj.weight: model-00002-of-00003.safetensors, model.visual.blocks.1.attn.qkv.bias: model-00002-of-00003.safetensors, model.visual.blocks.1.attn.qkv.weight: model-00002-of-00003.safetensors, model.visual.blocks.1.mlp.linear_fc1.bias: model-00002-of-00003.safetensors, model.visual.blocks.1.mlp.linear_fc1.weight: model-00002-of-00003.safetensors, model.visual.blocks.1.mlp.linear_fc2.bias: model-00002-of-00003.safetensors, model.visual.blocks.1.mlp.linear_fc2.weight: model-00002-of-00003.safetensors, model.visual.blocks.1.norm1.bias: model-00002-of-00003.safetensors, model.visual.blocks.1.norm1.weight: model-00002-of-00003.safetensors, model.visual.blocks.1.norm2.bias: model-00002-of-00003.safetensors, model.visual.blocks.1.norm2.weight: model-00002-of-00003.safetensors, ... model.visual.blocks.26.attn.proj.bias: model-00002-of-00003.safetensors, model.visual.blocks.26.attn.proj.weight: model-00002-of-00003.safetensors, model.visual.blocks.26.attn.qkv.bias: model-00002-of-00003.safetensors, model.visual.blocks.26.attn.qkv.weight: model-00002-of-00003.safetensors, model.visual.blocks.26.mlp.linear_fc1.bias: model-00002-of-00003.safetensors, model.visual.blocks.26.mlp.linear_fc1.weight: model-00002-of-00003.safetensors, model.visual.blocks.26.mlp.linear_fc2.bias: model-00002-of-00003.safetensors, model.visual.blocks.26.mlp.linear_fc2.weight: model-00002-of-00003.safetensors, model.visual.blocks.26.norm1.bias: model-00002-of-00003.safetensors, model.visual.blocks.26.norm1.weight: model-00002-of-00003.safetensors, model.visual.blocks.26.norm2.bias: model-00002-of-00003.safetensors, model.visual.blocks.26.norm2.weight: model-00002-of-00003.safetensors, mtp.layers.0.mlp.down_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.mlp.gate_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.mlp.up_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.k_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.o_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.q_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.v_proj.weight_shape: model-00002-of-00003.safetensors, mtp.layers.0.mlp.down_proj.weight_packed: model-00002-of-00003.safetensors, mtp.layers.0.mlp.gate_proj.weight_packed: model-00002-of-00003.safetensors, mtp.layers.0.mlp.up_proj.weight_packed: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.k_proj.weight_packed: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.o_proj.weight_packed: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.q_proj.weight_packed: model-00002-of-00003.safetensors, mtp.layers.0.self_attn.v_proj.weight_packed: model-00002-of-00003.safetensors, model.visual.blocks.4.mlp.linear_fc1.bias: model-00002-of-00003.safetensors, model.visual.blocks.4.mlp.linear_fc1.weight: model-00002-of-00003.safetensors, model.visual.blocks.4.mlp.linear_fc2.bias: model-00002-of-00003.safetensors, model.visual.blocks.4.mlp.linear_fc2.weight: model-00002-of-00003.safetensors, model.visual.blocks.4.norm1.bias: model-00002-of-00003.safetensors, model.visual.blocks.4.norm1.weight: model-00002-of-00003.safetensors, model.visual.blocks.4.norm2.bias: model-00002-of-00003.safetensors, model.visual.blocks.4.norm2.weight: model-00002-of-00003.safetensors, ... model.visual.blocks.9.attn.proj.bias: model-00003-of-00003.safetensors, model.visual.blocks.9.attn.proj.weight: model-00003-of-00003.safetensors, model.visual.blocks.9.attn.qkv.bias: model-00003-of-00003.safetensors, model.visual.blocks.9.attn.qkv.weight: model-00003-of-00003.safetensors, model.visual.blocks.9.mlp.linear_fc1.bias: model-00003-of-00003.safetensors, model.visual.blocks.9.mlp.linear_fc1.weight: model-00003-of-00003.safetensors, model.visual.blocks.9.mlp.linear_fc2.bias: model-00003-of-00003.safetensors, model.visual.blocks.9.mlp.linear_fc2.weight: model-00003-of-00003.safetensors, model.visual.blocks.9.norm1.bias: model-00003-of-00003.safetensors, model.visual.blocks.9.norm1.weight: model-00003-of-00003.safetensors, model.visual.blocks.9.norm2.bias: model-00003-of-00003.safetensors, model.visual.blocks.9.norm2.weight: model-00003-of-00003.safetensors, model.visual.merger.linear_fc1.bias: model-00003-of-00003.safetensors, model.visual.merger.linear_fc1.weight: model-00003-of-00003.safetensors, model.visual.merger.linear_fc2.bias: model-00003-of-00003.safetensors, model.visual.merger.linear_fc2.weight: model-00003-of-00003.safetensors, model.visual.merger.norm.bias: model-00003-of-00003.safetensors, model.visual.merger.norm.weight: model-00003-of-00003.safetensors, model.visual.patch_embed.proj.bias: model-00003-of-00003.safetensors, model.visual.patch_embed.proj.weight: model-00003-of-00003.safetensors, model.visual.pos_embed.weight: model-00003-of-00003.safetensors, mtp.fc.weight: model-00003-of-00003.safetensors, mtp.layers.0.input_layernorm.weight: model-00003-of-00003.safetensors, mtp.layers.0.mlp.down_proj.weight_scale: model-00003-of-00003.safetensors, mtp.layers.0.mlp.gate_proj.weight_scale: model-00003-of-00003.safetensors, mtp.layers.0.mlp.up_proj.weight_scale: model-00003-of-00003.safetensors, mtp.layers.0.post_attention_layernorm.weight: model-00003-of-00003.safetensors, mtp.layers.0.self_attn.k_norm.weight: model-00003-of-00003.safetensors, mtp.layers.0.self_attn.k_proj.weight_scale: model-00003-of-00003.safetensors, mtp.layers.0.self_attn.o_proj.weight_scale: model-00003-of-00003.safetensors, mtp.layers.0.self_attn.q_norm.weight: model-00003-of-00003.safetensors, mtp.layers.0.self_attn.q_proj.weight_scale: model-00003-of-00003.safetensors, mtp.layers.0.self_attn.v_proj.weight_scale: model-00003-of-00003.safetensors, mtp.norm.weight: model-00003-of-00003.safetensors, mtp.pre_fc_norm_embedding.weight: model-00003-of-00003.safetensors, mtp.pre_fc_norm_hidden.weight: model-00003-of-00003.safetensors } }1. 文件基本结构文件类型权重映射文件Weight Map用于指导加载模型时从哪个分片文件中读取特定参数。核心内容包含一个weight_map字典记录了模型中成百上千个具体参数张量Tensor的名称及其所在的物理文件路径。2. 模型架构特征根据权重名称推断该模型是一个混合架构的多模态大模型主要包含以下三个核心部分语言模型 (Language Model)包含 32 层Layer 0-31的 Transformer 结构。采用混合注意力机制部分层使用linear_attn线性注意力部分层使用self_attn标准自注意力。包含标准的 MLP前馈网络层和 LayerNorm 层。视觉编码器 (Visual Encoder)包含 27 个blocks视觉Transformer块用于处理图像/视频输入。包含patch_embed图像块嵌入和pos_embed位置编码等视觉特有组件。包含merger模块通常用于融合不同层级的视觉特征。多模态适配器 (MTP)包含单独的mtpMultimodal Task Processing模块包含线性层和注意力层用于连接视觉和语言模型。3. 存储与分片信息模型参数被分割存储在3 个.safetensors文件中文档详细记录了各部分参数的分布情况文件分布model-00001-of-00003.safetensors主要存储语言模型中大量层的权重形状weight_shape和打包数据weight_packed特别是涉及量化或特定格式的参数。model-00002-of-00003.safetensors存储量最大的文件。包含语言模型的大部分具体权重如A_log,conv1d,bias,scale、视觉编码器的前 26 层Blocks 0-26的所有参数以及 MTP 的部分底层参数。model-00003-of-00003.safetensors存储剩余参数。包含视觉编码器最后几层Blocks 7-9, 27 及部分残余、MTP 的顶层参数及归一化层权重。4. 关键技术细节量化/压缩迹象文件中大量出现weight_packed、weight_shape和weight_scale字段这通常表明该模型使用了量化技术如 GPTQ、AWQ 或特定的打包格式以减小模型体积并加速推理。参数规模从层数32层语言模型 27层视觉模型和稠密的参数命名来看这是一个参数量较大的多模态模型可能属于 Qwen-VL 或类似规模的模型系列。该文件是加载和运行该大模型的“目录索引”它揭示了模型采用混合注意力机制和量化存储技术将庞大的参数量分布于三个物理文件中以实现高效的存储与加载。tokenizer.json中为什么没有汉字在Qwen3.5:9b以及大多数现代大语言模型的tokenizer.json文件中确实不会直接以“可读文本”的形式显示汉字这通常会让初次查看该文件的用户感到困惑。这背后的原因主要与现代分词器Tokenizer采用的数据存储格式和算法设计有关。以下是详细的解释1. 使用了二进制序列化格式 (BPE Tokens)现代分词器如 Qwen 使用的基于 BPE - Byte Pair Encoding 的变种通常不直接存储“字符到 ID”的映射而是存储字节对Byte Pairs或Unicode 编码。不是“字典”而是“算法”tokenizer.json实际上存储的是 BPE 算法的合并规则Merge Rules和词汇表Vocab。汉字被拆解汉字在 Unicode 中有特定的编码例如 UTF-8 或 UTF-16。在 BPE 算法中汉字通常不会被拆成笔画而是作为一个整体的字节序列存储。在 JSON 文件中为了兼容性和压缩这些字节序列通常会被转义或编码而不是直接显示为人类可读的汉字。查看方式如果你用文本编辑器打开tokenizer.json你看到的可能是类似\u4e2dUnicode 转义或者是一堆 Base64 编码/二进制数据而不是直接的“中”字。2. 词汇表 (Vocabulary) 的存储方式即使在 JSON 中包含了词汇表它通常也是以下两种形式之一导致汉字不可见ID - Token 映射文件中可能只存储了{12345: token_string}。这里的token_string可能是字节序列的表示而不是汉字本身。外部引用在某些实现中基础的 Unicode 字符集是内置在代码库中的tokenizer.json只存储模型特有的、由多个字符组成的“子词Subwords”规则。3. Qwen 模型的特殊性Qwen 系列模型包括 Qwen3.5使用的是TikToken或类似的高效 BPE 实现。高效的二进制存储为了加载速度和文件大小的优化TikToken 通常将核心词汇表编译为二进制格式或者在 JSON 中使用高度压缩的数组结构。多语言混合Qwen 的词表非常大通常超过 10 万个 Token涵盖了多种语言。为了统一处理所有字符包括汉字、英文字母、标点符号都被统一映射为整数 ID。在配置文件中你看到的往往是这些 ID 的对应关系而不是原始字符。如何查看真正的汉字词表如果你想查看 Qwen3.5 词表中到底包含哪些汉字和词汇直接看tokenizer.json是很难看懂的。你需要使用官方提供的 Python 库来反向解析from transformers import AutoTokenizer # 加载 Qwen3.5 的分词器 tokenizer AutoTokenizer.from_pretrained(Qwen/Qwen3.5-9B) # 查看词汇表 (这将是一个巨大的字典) vocab tokenizer.get_vocab() # 反向查找ID 到 Token # 例如查找 ID 100 的内容 print(tokenizer.decode([100])) # 或者遍历词汇表查找汉字 (汉字的 Unicode 范围大致是 \u4e00-\u9fff) for token, id in vocab.items(): # 这里的 token 可能是 bytes 或 string if \u4e00 token \u9fff: # 简单判断是否为汉字范围 print(fID: {id} - Token: {token})总结tokenizer.json中没有直接可见的汉字是因为1、技术限制它存储的是算法规则BPE和字节/Unicode 编码而非人类可读的文本列表。2、性能优化为了追求加载速度和存储效率现代分词器倾向于使用二进制或压缩格式。3、统一编码所有字符包括汉字都被统一抽象为整数 ID。结论这并不代表模型不支持汉字恰恰相反这是大语言模型处理多语言包括汉字的标准且高效的方式。要查看具体的汉字映射请务必使用transformers库中的tokenizer.get_vocab()函数。