CANN/pto-isa PTO演示示例
PTO Demos【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isaThis directory contains demonstration examples showing how to use PTO Tile Library in different scenarios.Directory Structuredemos/ ├── baseline/ # Production PyTorch operator examples (NPU) │ ├── add/ # Basic element-wise addition │ ├── gemm_basic/ # GEMM with pipeline optimization │ └── flash_atten/ # Flash Attention with dynamic tiling ├── cpu/ # CPU simulation demos (cross-platform) │ ├── gemm_demo/ │ ├── flash_attention_demo/ │ └── mla_attention_demo/ └── torch_jit/ # PyTorch JIT compilation examples ├── add/ ├── gemm/ └── flash_atten/Demo Categories1. Baseline (baseline/)Production-ready examples showing how to implement custom PTO kernels and expose them as PyTorch operators viatorch_npu. Includes complete workflow from kernel implementation to Python integration with CMake build system and wheel packaging.Supported Platforms: A2/A3/A5Examples: Element-wise addition, GEMM with double-buffering pipeline, Flash Attention with automatic tile size selection.2. CPU Simulation (cpu/)Cross-platform examples that run on CPU (x86_64/AArch64) without requiring Ascend hardware. Ideal for algorithm prototyping, learning PTO programming model, and CI/CD testing.Examples: Basic GEMM, Flash Attention, Multi-Latent Attention.3. PyTorch JIT (torch_jit/)Examples showing on-the-fly C compilation and direct integration with PyTorch tensors. Useful for rapid prototyping without pre-building wheels.Examples: JIT addition, JIT GEMM, JIT Flash Attention with benchmark suite.Quick StartCPU Simulation (Recommended First Step)python3 tests/run_cpu.py --demo gemm --verbose python3 tests/run_cpu.py --demo flash_attn --verboseNPU Baseline Examplecd demos/baseline/add python -m venv virEnv source virEnv/bin/activate pip install -r requirements.txt export PTO_LIB_PATH[YOUR_PATH]/pto-isa python3 setup.py bdist_wheel pip install dist/*.whl cd test python3 test.pyJIT Exampleexport PTO_LIB_PATH[YOUR_PATH]/pto-isa cd demos/torch_jit/add python add_compile_and_run.pyPrerequisitesFor Baseline and JIT (NPU):Ascend AI Processor A2/A3/A5(910B/910C/950)CANN Toolkit 8.5.0PyTorch withtorch_npuPython 3.8, CMake 3.16For CPU Demos:C compiler with C23 supportCMake 3.16Python 3.8 (optional)DocumentationGetting Started: docs/getting-started.mdProgramming Tutorial: docs/coding/tutorial.mdISA Reference: docs/isa/README.mdRelatedManual Kernels: kernels/manual/README.mdCustom Operators: kernels/custom/README.mdTest Cases: tests/README.md【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考