Node.js后端服务调用Phi-3-mini：构建AI中间层REST API实战

张

张建站

2026/5/4 10:57:20

10分钟阅读

Node.js后端服务调用Phi-3-mini构建AI中间层REST API实战1. 为什么需要AI中间层在Web应用中直接调用本地部署的大语言模型会遇到几个典型问题前端无法直接访问本地模型、缺乏统一的API规范、难以管理并发请求、没有缓存和限流机制。这就是我们需要构建AI中间层的原因。想象一下你开发了一个电商客服系统前端需要频繁调用Phi-3-mini生成回复。如果没有中间层每个前端请求都要直接连接模型服务不仅安全性存疑当流量突增时还可能把模型服务压垮。而中间层就像个智能管家帮你打理好一切。2. 环境准备与快速部署2.1 Node.js安装及环境配置首先确保你的开发环境已经准备好# 检查Node.js版本需要v16 node -v # 如果没有安装可以用nvm管理版本 curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash nvm install 182.2 初始化项目创建一个新目录并初始化Node.js项目mkdir phi3-middleware cd phi3-middleware npm init -y npm install express axios body-parser p-queue这里我们安装了四个核心包expressWeb框架axiosHTTP客户端body-parser请求体解析p-queue并发控制3. 构建基础REST API服务3.1 创建Express服务器新建server.js文件搭建基础框架const express require(express); const bodyParser require(body-parser); const app express(); // 中间件配置 app.use(bodyParser.json()); // 健康检查端点 app.get(/health, (req, res) { res.json({ status: healthy }); }); // 启动服务器 const PORT process.env.PORT || 3000; app.listen(PORT, () { console.log(Server running on port ${PORT}); });测试服务是否正常运行node server.js curl http://localhost:3000/health3.2 连接Phi-3-mini模型假设Phi-3-mini已经在本地运行比如通过Ollama通常会在http://localhost:11434提供服务。我们添加一个代理接口const axios require(axios); const PHI3_URL http://localhost:11434/api/generate; app.post(/api/chat, async (req, res) { try { const { prompt } req.body; const response await axios.post(PHI3_URL, { model: phi3, prompt: prompt }); res.json(response.data); } catch (error) { res.status(500).json({ error: error.message }); } });现在你可以用curl测试这个接口curl -X POST http://localhost:3000/api/chat \ -H Content-Type: application/json \ -d {prompt:介绍一下Node.js}4. 进阶功能实现4.1 请求队列管理直接调用模型接口可能导致并发过高。我们用p-queue控制并发const { default: PQueue } require(p-queue); const queue new PQueue({ concurrency: 3 }); // 最多3个并发 app.post(/api/chat, async (req, res) { await queue.add(async () { try { const { prompt } req.body; const response await axios.post(PHI3_URL, { model: phi3, prompt: prompt }); res.json(response.data); } catch (error) { res.status(500).json({ error: error.message }); } }); });4.2 简单缓存实现对相同prompt的请求我们可以缓存结果const cache new Map(); app.post(/api/chat, async (req, res) { const { prompt } req.body; if (cache.has(prompt)) { return res.json(cache.get(prompt)); } await queue.add(async () { try { const response await axios.post(PHI3_URL, { model: phi3, prompt: prompt }); cache.set(prompt, response.data); res.json(response.data); } catch (error) { res.status(500).json({ error: error.message }); } }); });4.3 限流保护防止单个客户端发送过多请求const rateLimit require(express-rate-limit); const limiter rateLimit({ windowMs: 15 * 60 * 1000, // 15分钟 max: 100 // 每个IP最多100次请求 }); app.use(limiter);5. 完整代码示例以下是整合所有功能的完整server.jsconst express require(express); const bodyParser require(body-parser); const axios require(axios); const { default: PQueue } require(p-queue); const rateLimit require(express-rate-limit); const app express(); const PHI3_URL http://localhost:11434/api/generate; const queue new PQueue({ concurrency: 3 }); const cache new Map(); // 中间件 app.use(bodyParser.json()); app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 })); // 健康检查 app.get(/health, (req, res) { res.json({ status: healthy }); }); // 聊天接口 app.post(/api/chat, async (req, res) { const { prompt } req.body; if (cache.has(prompt)) { return res.json(cache.get(prompt)); } await queue.add(async () { try { const response await axios.post(PHI3_URL, { model: phi3, prompt: prompt }); cache.set(prompt, response.data); res.json(response.data); } catch (error) { res.status(500).json({ error: error.message }); } }); }); // 启动服务器 const PORT process.env.PORT || 3000; app.listen(PORT, () { console.log(Server running on port ${PORT}); });6. 部署与测试建议实际部署时建议使用PM2管理Node.js进程npm install -g pm2 pm2 start server.js --name phi3-middleware pm2 save pm2 startup测试时可以模拟多个并发请求const axios require(axios); const prompts [Node.js是什么, 如何学习JavaScript, 解释一下闭包]; async function test() { const results await Promise.all( prompts.map(prompt axios.post(http://localhost:3000/api/chat, { prompt }) ) ); console.log(results.map(r r.data)); } test();7. 总结通过这个实战教程我们构建了一个功能完整的AI中间层服务。它不仅解决了前端直接调用模型的问题还通过队列、缓存和限流等机制提升了系统的稳定性和性能。实际使用中你还可以考虑添加认证、日志监控等功能让中间层更加健壮。这套方案已经在我们团队的生产环境中稳定运行处理日均上万次模型调用。特别是在流量高峰时段队列和缓存机制显著降低了模型服务的压力。如果你正在寻找一个简单高效的AI中间层解决方案不妨从这个基础版本开始根据实际需求逐步扩展。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。

爱毕业aibiye等9款免费查重工具，支持AI智能降重，每日检测次数无限制

核心工具对比速览工具名称查重速度降重效果特色功能适用场景 aicheck 极快重复率可降30% 专业术语保留高重复率紧急处理 aibiye 中等逻辑优化明显学术表达增强提升论文质量 askpaper 快结构保持完整多语言支持外文论文降重秒篇极快上下文…...

2026/4/29 18:30:19 阅读更多 →

为什么92%的多模态量化项目卡在推理延迟＞800ms？——基于TensorRT-LLM+ONNX Runtime的7步超低延时部署流水线

第一章：多模态大模型量化压缩技术概览 2026奇点智能技术大会(https://ml-summit.org) 多模态大模型（Multimodal Large Language Models, MLLMs）正以前所未有的规模融合视觉、语言、音频乃至时空信号，但其参数量动辄数十亿至千亿…...

2026/4/27 1:37:11 阅读更多 →

Linux下PaddlePaddle GPU版Segmentation fault报错终极解决指南（附永久环境变量配置）

Linux下PaddlePaddle GPU版Segmentation fault报错终极解决指南最近在部署PaddlePaddle GPU版本时，不少开发者反馈遇到了令人头疼的Segmentation fault错误。这种错误通常出现在Linux生产环境中，特别是从2.1.3版本开始，使用PaddleSpeech等依…...

2026/4/19 12:54:50 阅读更多 →

LoopViT：结合循环机制的视觉Transformer优化架构

1. 项目概述在计算机视觉领域，Transformer架构近年来展现出惊人的潜力。LoopViT是我最近开发的一种新型视觉推理架构，它通过引入循环机制改进了传统视觉Transformer的计算效率和信息流模式。这个架构特别适合处理视频分析、医学影像分割等需要时序建模的…...

2026/5/3 0:06:07 阅读更多 →

实战指南：深度解锁微信网页版，让浏览器也能畅快聊天

实战指南：深度解锁微信网页版，让浏览器也能畅快聊天【免费下载链接】wechat-need-web 让微信网页版可用 / Allow the use of WeChat via webpage access 项目地址: https://gitcode.com/gh_mirrors/we/wechat-need-web 还在为微信网页版频繁提示…...

2026/5/3 0:10:11 阅读更多 →

智慧树学习效率提升指南：如何用自动化工具节省80%学习时间

智慧树学习效率提升指南：如何用自动化工具节省80%学习时间【免费下载链接】zhihuishu 智慧树刷课插件，自动播放下一集、1.5倍速度、无声项目地址: https://gitcode.com/gh_mirrors/zh/zhihuishu 还在为智慧树平台繁琐的视频学习流程而烦恼吗&am…...

2026/5/3 0:27:49 阅读更多 →