CANN/cannbot-skills MoE-Only Scope模板
MoE-Only Scope 模板【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills本模板提供仅将 MoE 模块纳入 SuperKernel scope 的实现方案。适用场景MoE 架构模型如 DeepSeek-V3、Qwen-MoEMoE 是性能瓶颈专家计算占用大量时间需要针对性优化 MoE 部分优点针对性强专门优化 MoE 瓶颈效果明显MoE 计算密集融合效果好风险可控不影响 Attention 等其他模块实现方案# cann-recipes-infer/models/{model_name}/models/modeling_*.py from executor.utils import superkernel_scope class MoELayer(nn.Module): def __init__(self, config): super().__init__() self.enable_superkernel config.enable_superkernel self.num_experts config.num_experts self.gate nn.Linear(config.hidden_size, config.num_experts) self.experts nn.ModuleList([ Expert(config) for _ in range(config.num_experts) ]) def forward(self, hidden_states, is_prefillFalse): batch_size, seq_len, hidden_dim hidden_states.shape # 路由计算在 Scope 外 router_logits self.gate(hidden_states) routing_weights F.softmax(router_logits, dim-1) # SuperKernel scope 仅包含专家计算 with superkernel_scope( self.enable_superkernel and not is_prefill, labelmoe_experts, optionstream-fusion1 ): # 专家计算 final_hidden_states torch.zeros_like(hidden_states) for expert_idx in range(self.num_experts): # 选择当前专家的 token expert_mask routing_weights[:, :, expert_idx] threshold expert_input hidden_states[expert_mask] # 专家计算 expert_output self.experts[expert_idx](https://link.gitcode.com/i/ed2d29ff689bfebc7dbb42596dbdbfb4) # 加权累加 final_hidden_states[expert_mask] ( expert_output * routing_weights[expert_mask, expert_idx].unsqueeze(-1) ) return final_hidden_states class DecoderLayer(nn.Module): def forward(self, hidden_states, is_prefillFalse, **kwargs): # Attention在 Scope 外 residual hidden_states hidden_states self.input_layernorm(hidden_states) attn_output, _ self.self_attn(hidden_states, is_prefillis_prefill) hidden_states residual attn_output # MoE在 SuperKernel Scope 内 residual hidden_states hidden_states self.post_attention_layernorm(hidden_states) moe_output self.moe(hidden_states, is_prefillis_prefill) hidden_states residual moe_output return hidden_states配置文件exe_mode: ge_graph model_config: enable_superkernel: True # MoE 相关配置 num_experts: 64 num_experts_per_tok: 8 moe_chunk_max_len: 1024 # 可选MoE chunk 优化预期性能提升模型类型预期提升说明DeepSeek-V320-30%MoE 占比大Qwen-MoE15-25%专家计算密集其他 MoE10-20%取决于 MoE 占比参考资源Attention-Only 模板attention-only.mdFull-Model 模板full-model.md【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考