华为昇腾310P废物利用——大模型推理服务

张

张建站

2026/7/4 12:26:07

10分钟阅读

华为昇腾310P废物利用注310P不支持bf16、W4A4带宽200G双芯版的300I duo, 有48g和96g两种目前市面上所有昇腾的卡均不支持FP8最终性能优化结果Qwen3-8B-W8A8TPS 15Tokens/s昇腾的PyTorch图模式使用和vllm-ascend的源码里面有reduce-overhead和max-autotune两种模式reduce-overhead只支持910B和910C而且vllm-ascend里面写死了reduce-overhead模式MindIE Qwen 3-8B-W8A81. Launch the container on thehostdockerrun-it-d--nethost --shm-size16g\--namemindie-qwen3-8b-310p\-w/workspace/MindIE-LLM/examples/atb_models\--device/dev/davinci0:rwm\--device/dev/davinci1:rwm\--device/dev/davinci2:rwm\--device/dev/davinci3:rwm\--device/dev/davinci_manager:rwm\--device/dev/hisi_hdc:rwm\--device/dev/devmm_svm:rwm\-v/usr/local/Ascend/driver:/usr/local/Ascend/driver:ro\-v/usr/local/dcmi:/usr/local/dcmi:ro\-v/usr/local/bin/npu-smi:/usr/local/bin/npu-smi:ro\-v/usr/local/sbin:/usr/local/sbin:ro\-v/Users/zhaojiacheng/repos/MindIE-LLM:/workspace/MindIE-LLM\-v/home/s_zhaojiacheng:/home/s_zhaojiacheng\swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:3.0.0b2-300I-Duo-py311-openeuler24.03-lts\bashEnter the container:dockerexec-itmindie-qwen3-8b-310pbash2. Prepare the environment inside the containercd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh prepare-env3. Download the model from ModelScope Recommended: download directly into a normal directory, not only into the default cache.mkdir-p/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s modelscope download\--modelEco-Tech/Qwen3-8B-w8a8s-310\--local_dir/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s If you already downloaded it earlier into the default cache with: modelscope download--modelEco-Tech/Qwen3-8B-w8a8s-310thenflatten it into a real directory first:mkdir-p/home/s_zhaojiacheng/models/Qwen3-8B-w8a8scp-aL\/home/s_zhaojiacheng/.cache/modelscope/hub/models/Eco-Tech/Qwen3-8B-w8a8s-310/.\/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s/ Check the files exist:ls/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s4. Compress W8A8S into W8A8SCcd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh compress\--w8a8s-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8s\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc After it finishes, check the output directory exists:ls/home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc5. Start the OpenAI-compatible servercd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh serve\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc\--model-name qwen3-8b-w8a8sc\--port1025This should start mindie_llm_server and expose the OpenAI-compatible endpoint on127.0.0.1:1025.6. Verify theserviceList models: curlhttp://127.0.0.1:1025/v1/models Expected model id: qwen3-8b-w8a8sc Test one inference request: curlhttp://127.0.0.1:1025/v1/chat/completions\-HContent-Type: application/json\-d{ model: qwen3-8b-w8a8sc, messages: [ {role: user, content: What is deep learning?} ], max_tokens: 128, stream: false }Short version If you want the shortest working sequence inside the container:cd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh prepare-env modelscope download\--modelEco-Tech/Qwen3-8B-w8a8s-310\--local_dir/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s scripts/qwen3_8b_310p_w8a8sc.sh compress\--w8a8s-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8s\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc scripts/qwen3_8b_310p_w8a8sc.sh serve\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc\--model-name qwen3-8b-w8a8sc\--port1025Then test: curlhttp://127.0.0.1:1025/v1/models One important detail:forthis single-310P flow,donot try to serve Qwen3-8B-w8a8s-310 directly. The supported path is download W8A8S -compress to W8A8SC -serve W8A8SC. If you want, I can also rewrite this into one clean host-sidebashscript that doesdockerrun,dockerexec, download, compress, and serve end to end.

Windows多显示器DPI缩放不一致？SetDPI命令行工具帮你精准控制显示比例

Windows多显示器DPI缩放不一致？SetDPI命令行工具帮你精准控制显示比例【免费下载链接】SetDPI 项目地址: https://gitcode.com/gh_mirrors/se/SetDPI 你是否曾在连接多个显示器时，被Windows系统不一致的DPI缩放困扰？主显示器上文字清…...

2026/6/28 14:47:30 阅读更多 →

别再只会用两步法了！VCS混合语言仿真（VHDL/Verilog）三步法保姆级配置指南

VCS混合语言仿真实战：三步法解决VHDL/Verilog协同验证难题在芯片验证领域，混合语言仿真是许多工程师不得不面对的挑战。当项目同时包含VHDL编写的传统IP核和Verilog/SV开发的新模块时，如何高效完成仿真验证？本文将彻底解析VCS工具…...

2026/6/28 15:23:58 阅读更多 →

B站成分检测器：智能识别评论区用户身份的终极指南

B站成分检测器：智能识别评论区用户身份的终极指南【免费下载链接】bilibili-comment-checker B站评论区自动标注成分，支持动态和关注识别以及手动输入 UID 识别项目地址: https://gitcode.com/gh_mirrors/bil/bilibili-comment-checker 在B站海…...

2026/6/29 6:07:19 阅读更多 →