CANN/cannbot-skills MoE-Only Scope模板

张

张建站

2026/5/9 13:59:46

10分钟阅读

MoE-Only Scope 模板【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills本模板提供仅将 MoE 模块纳入 SuperKernel scope 的实现方案。适用场景MoE 架构模型如 DeepSeek-V3、Qwen-MoEMoE 是性能瓶颈专家计算占用大量时间需要针对性优化 MoE 部分优点针对性强专门优化 MoE 瓶颈效果明显MoE 计算密集融合效果好风险可控不影响 Attention 等其他模块实现方案# cann-recipes-infer/models/{model_name}/models/modeling_*.py from executor.utils import superkernel_scope class MoELayer(nn.Module): def __init__(self, config): super().__init__() self.enable_superkernel config.enable_superkernel self.num_experts config.num_experts self.gate nn.Linear(config.hidden_size, config.num_experts) self.experts nn.ModuleList([ Expert(config) for _ in range(config.num_experts) ]) def forward(self, hidden_states, is_prefillFalse): batch_size, seq_len, hidden_dim hidden_states.shape # 路由计算在 Scope 外 router_logits self.gate(hidden_states) routing_weights F.softmax(router_logits, dim-1) # SuperKernel scope 仅包含专家计算 with superkernel_scope( self.enable_superkernel and not is_prefill, labelmoe_experts, optionstream-fusion1 ): # 专家计算 final_hidden_states torch.zeros_like(hidden_states) for expert_idx in range(self.num_experts): # 选择当前专家的 token expert_mask routing_weights[:, :, expert_idx] threshold expert_input hidden_states[expert_mask] # 专家计算 expert_output self.experts[expert_idx](https://link.gitcode.com/i/ed2d29ff689bfebc7dbb42596dbdbfb4) # 加权累加 final_hidden_states[expert_mask] ( expert_output * routing_weights[expert_mask, expert_idx].unsqueeze(-1) ) return final_hidden_states class DecoderLayer(nn.Module): def forward(self, hidden_states, is_prefillFalse, **kwargs): # Attention在 Scope 外 residual hidden_states hidden_states self.input_layernorm(hidden_states) attn_output, _ self.self_attn(hidden_states, is_prefillis_prefill) hidden_states residual attn_output # MoE在 SuperKernel Scope 内 residual hidden_states hidden_states self.post_attention_layernorm(hidden_states) moe_output self.moe(hidden_states, is_prefillis_prefill) hidden_states residual moe_output return hidden_states配置文件exe_mode: ge_graph model_config: enable_superkernel: True # MoE 相关配置 num_experts: 64 num_experts_per_tok: 8 moe_chunk_max_len: 1024 # 可选MoE chunk 优化预期性能提升模型类型预期提升说明DeepSeek-V320-30%MoE 占比大Qwen-MoE15-25%专家计算密集其他 MoE10-20%取决于 MoE 占比参考资源Attention-Only 模板attention-only.mdFull-Model 模板full-model.md【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

微波辐射测温与AI融合：乳腺癌早期无创检测技术原理与实践

1. 项目概述：当微波遇见AI，为乳腺健康预警最近几年，身边关注乳腺健康的朋友越来越多，常规的体检项目里，乳腺超声和钼靶检查几乎是“标配”。但大家也常聊到一个痛点：这些检查要么有辐射顾虑，要么…...

2026/5/9 13:51:41 阅读更多 →

HarmonyOS 6.1 全栈实战录 - 04 镜像世界：Spatial Recon Kit 3D空间重建与企业级高精度建模实战

HarmonyOS 6.1 全栈实战录 - 04 镜像世界：Spatial Recon Kit 3D空间重建与企业级高精度建模实战在前三篇的实战进阶中，我们完成了从沉浸式视效表达（01 篇）、人脸微表情感知（02 篇）到 20 点骨骼动态捕获&am…...

2026/5/9 13:45:46 阅读更多 →

$CANN/ops-math PadV3Grad算子$

CANN/ops-math PadV3Grad算子

PadV3Grad 【免费下载链接】ops-math 本项目是CANN提供的数学类基础计算算子库，实现网络在NPU上加速计算。项目地址: https://gitcode.com/cann/ops-math 产品支持情况产品是否支持Ascend 950PR/Ascend 950DT√Atlas A3 训练系列产品/Atlas A3 推理系列产…...

2026/5/9 13:35:38 阅读更多 →

LoopViT：结合循环机制的视觉Transformer优化架构

1. 项目概述在计算机视觉领域，Transformer架构近年来展现出惊人的潜力。LoopViT是我最近开发的一种新型视觉推理架构，它通过引入循环机制改进了传统视觉Transformer的计算效率和信息流模式。这个架构特别适合处理视频分析、医学影像分割等需要时序建模的…...

2026/5/8 5:06:09 阅读更多 →

实战指南：深度解锁微信网页版，让浏览器也能畅快聊天

实战指南：深度解锁微信网页版，让浏览器也能畅快聊天【免费下载链接】wechat-need-web 让微信网页版可用 / Allow the use of WeChat via webpage access 项目地址: https://gitcode.com/gh_mirrors/we/wechat-need-web 还在为微信网页版频繁提示…...

2026/5/9 14:14:14 阅读更多 →

智慧树学习效率提升指南：如何用自动化工具节省80%学习时间

智慧树学习效率提升指南：如何用自动化工具节省80%学习时间【免费下载链接】zhihuishu 智慧树刷课插件，自动播放下一集、1.5倍速度、无声项目地址: https://gitcode.com/gh_mirrors/zh/zhihuishu 还在为智慧树平台繁琐的视频学习流程而烦恼吗&am…...

2026/5/9 1:50:48 阅读更多 →