GLM-OCR API调用详解：Python示例，助你快速集成到项目

张

张建站

2026/6/11 6:42:02

10分钟阅读

GLM-OCR API调用详解Python示例助你快速集成到项目1. GLM-OCR简介与核心能力GLM-OCR是一款专业级多模态OCR模型在权威文档解析基准测试OmniDocBench V1.5中以94.6分取得SOTA表现。它不仅能识别常规文本还能处理复杂的文档结构特别适合需要高精度识别的业务场景。1.1 四大核心功能文本识别支持中英文混合识别准确率高达98%公式解析可识别LaTeX格式的数学公式表格还原保持原始表格结构支持合并单元格信息抽取从文档中提取关键字段如发票金额、日期等1.2 技术优势轻量级设计单张消费级显卡即可运行RESTful API接口易于集成到现有系统平均响应时间500msA10G显卡支持PNG/JPG/JPEG/WEBP等多种图片格式2. 环境准备与API服务配置2.1 服务部署检查在开始调用API前请确保服务已正确启动# 检查服务状态 supervisorctl status # 预期输出示例 glm-ocr:glm-ocr-webui RUNNING pid 1234, uptime 0:05:23 glm-ocr:glm-ocr RUNNING pid 1235, uptime 0:05:23如果服务未运行执行以下命令启动supervisorctl restart glm-ocr:*2.2 端口确认GLM-OCR默认使用两个端口7860Web界面可选8080API服务必需确保防火墙已放行这些端口# Ubuntu示例 sudo ufw allow 8080/tcp3. Python API调用实战3.1 基础文本识别示例以下是最简单的文本识别代码import requests import base64 def ocr_basic(image_path): # 读取图片并编码 with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } response requests.post(url, jsonpayload) return response.json() # 使用示例 result ocr_basic(invoice.png) print(result[choices][0][message][content])3.2 表格识别进阶示例对于表格类文档可以指定识别模式def ocr_table(image_path): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Table Recognition:} # 关键指令 ] } ] } response requests.post(url, jsonpayload) return response.json() # 返回的表格数据通常为Markdown格式 table_result ocr_table(financial_report.png) print(table_result[choices][0][message][content])3.3 批量处理与性能优化当需要处理大量图片时建议使用异步请求import aiohttp import asyncio async def async_ocr(image_paths): async with aiohttp.ClientSession() as session: tasks [] for path in image_paths: with open(path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } tasks.append(session.post(http://localhost:8080/v1/chat/completions, jsonpayload)) responses await asyncio.gather(*tasks) return [await r.json() for r in responses] # 使用示例 image_list [doc1.png, doc2.png, doc3.png] results asyncio.run(async_ocr(image_list))4. 高级功能与实用技巧4.1 混合内容识别GLM-OCR可以同时处理包含文本、公式和表格的复杂文档def mixed_ocr(image_path): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: 识别文档中的所有内容包括文本、公式和表格} ] } ], temperature: 0.3 # 降低随机性提高识别稳定性 } response requests.post(url, jsonpayload) return response.json()4.2 结构化数据提取通过自然语言指令提取特定信息def extract_info(image_path, query): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: query} ] } ] } response requests.post(url, jsonpayload) return response.json() # 示例提取发票信息 invoice_info extract_info(invoice.jpg, 提取发票中的开票日期、金额、销售方名称) print(invoice_info[choices][0][message][content])4.3 图像预处理建议为提高识别准确率建议在调用API前进行简单预处理from PIL import Image import numpy as np def preprocess_image(image_path): img Image.open(image_path) # 自动旋转校正如有必要 if hasattr(img, _getexif): exif img._getexif() if exif: orientation exif.get(0x0112) if orientation 3: img img.rotate(180, expandTrue) elif orientation 6: img img.rotate(270, expandTrue) elif orientation 8: img img.rotate(90, expandTrue) # 对比度增强 img img.convert(L) img np.array(img) img (img - img.min()) * (255 / (img.max() - img.min())) img Image.fromarray(img.astype(uint8)) return img # 使用预处理后的图片 processed_img preprocess_image(low_contrast.jpg) processed_img.save(processed.jpg) result ocr_basic(processed.jpg)5. 错误处理与性能优化5.1 常见错误代码处理完善的API调用应包含错误处理逻辑def robust_ocr(image_path): try: with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } response requests.post(url, jsonpayload, timeout10) response.raise_for_status() result response.json() if choices not in result: raise ValueError(Invalid response format) return result[choices][0][message][content] except requests.exceptions.RequestException as e: print(f请求失败: {str(e)}) return None except Exception as e: print(f处理失败: {str(e)}) return None5.2 性能优化建议连接池复用为高频调用创建Session对象超时设置避免长时间等待结果缓存对相同图片缓存识别结果优化后的示例from functools import lru_cache import hashlib session requests.Session() def get_image_hash(image_path): with open(image_path, rb) as f: return hashlib.md5(f.read()).hexdigest() lru_cache(maxsize100) def cached_ocr(image_hash, image_path): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } try: response session.post(url, jsonpayload, timeout5) response.raise_for_status() return response.json() except requests.exceptions.RequestException: return None # 使用示例 image_hash get_image_hash(contract.pdf) result cached_ocr(image_hash, contract.pdf)6. 项目集成建议6.1 微服务架构设计建议将GLM-OCR封装为独立微服务项目架构示例 └── 您的业务系统 ├── Web前端 ├── 业务逻辑层 └── OCR服务网关 ← 调用 → GLM-OCR微服务(本机或内网)6.2 Django集成示例在Django项目中创建OCR服务模块# ocr_service.py import requests from django.conf import settings class OCRService: def __init__(self): self.api_url settings.OCR_API_URL # 配置在settings.py中 def recognize_text(self, image_file): image_data base64.b64encode(image_file.read()).decode(utf-8) payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } response requests.post(self.api_url, jsonpayload) return response.json() # views.py中使用示例 from .ocr_service import OCRService def process_document(request): if request.method POST: uploaded_file request.FILES[document] ocr OCRService() result ocr.recognize_text(uploaded_file) return JsonResponse(result) return HttpResponseBadRequest()6.3 Flask集成示例快速创建OCR API网关from flask import Flask, request, jsonify import requests app Flask(__name__) OCR_API http://localhost:8080/v1/chat/completions app.route(/api/ocr, methods[POST]) def ocr_proxy(): if file not in request.files: return jsonify({error: No file uploaded}), 400 file request.files[file] image_data base64.b64encode(file.read()).decode(utf-8) payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: request.form.get(instruction, Text Recognition:)} ] } ] } response requests.post(OCR_API, jsonpayload) return jsonify(response.json()) if __name__ __main__: app.run(port5000)7. 总结与最佳实践7.1 核心要点回顾GLM-OCR提供RESTful API接口支持文本、公式、表格的识别Python集成简单只需requests库即可调用通过调整content中的text指令可以控制识别模式生产环境建议添加错误处理、缓存和连接池优化7.2 推荐实践方案开发环境直接调用本地8080端口测试环境使用Nginx反向代理添加负载均衡生产环境部署多个OCR实例使用Kubernetes进行容器编排添加API网关进行限流和认证7.3 性能基准参考以下是在不同硬件上的性能测试数据硬件配置图片尺寸平均响应时间最大QPSRTX 3060 (12GB)1024x768620ms8T4 (16GB)1024x768480ms12A10G (24GB)1024x768350ms207.4 后续学习建议尝试结合PaddleOCR等开源工具进行结果校验探索与LangChain等框架的集成方案关注模型更新及时获取性能提升获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。

从头构建可审计合约项目：C++26 contracts + CMake + sanitizers + CI流水线（GitHub Actions一键部署版）

更多请点击： https://intelliparadigm.com 第一章：从头构建可审计合约项目：C26 contracts CMake sanitizers CI流水线（GitHub Actions一键部署版） C26 引入的 std::contract 机制为运行时契约验证提供了标准化、编…...

2026/6/11 6:37:49 阅读更多 →

wxauto微信自动化：无需编程基础，轻松打造你的专属智能助手

wxauto微信自动化：无需编程基础，轻松打造你的专属智能助手【免费下载链接】wxauto Windows版本微信客户端（非网页版）自动化，可实现简单的发送、接收微信消息，简单微信机器人项目地址: https://gitcode.…...

2026/6/11 8:11:00 阅读更多 →

ARM A64指令集架构解析与编码优化实践

1. ARM A64指令集架构概述ARMv8/v9架构的A64指令集是ARM 64位处理器的基础执行环境，采用固定32位长度的指令编码格式。与传统的变长指令集不同，A64的固定长度设计简化了指令流水线的实现，同时通过精心设计的编码空间划分支持丰富的功能扩展。…...

2026/6/9 12:43:24 阅读更多 →

CSDN AI数字营销卡片配置手册（跳转权限解禁版）：官方未公开的3种合规跳转变通方案

更多请点击： https://codechina.net 第一章：CSDN AI 数字营销的引流卡片支持跳转官网、小程序链接吗？ CSDN AI 数字营销平台提供的引流卡片，是面向技术创作者与企业用户的核心转化组件，其核心能力之一即为外链跳转。目…...

2026/6/10 17:09:16 阅读更多 →

如何3分钟找回遗忘的压缩包密码：免费开源工具的终极指南

如何3分钟找回遗忘的压缩包密码：免费开源工具的终极指南【免费下载链接】ArchivePasswordTestTool 利用7zip测试压缩包的功能对加密压缩包进行自动化测试密码项目地址: https://gitcode.com/gh_mirrors/ar/ArchivePasswordTestTool 你是否曾经面对一个加密…...

2026/6/10 1:59:41 阅读更多 →

Linux桌面便签神器：Sticky如何让你的工作效率提升300%？

Linux桌面便签神器：Sticky如何让你的工作效率提升300%？ 【免费下载链接】sticky A sticky notes app for the linux desktop 项目地址: https://gitcode.com/gh_mirrors/stic/sticky 在Linux桌面上，你是否经常需要快速记录一闪而过的灵…...

2026/6/10 19:11:44 阅读更多 →

YOLO11部署优化：OpenVINO推理 | 在Intel CPU上利用OpenVINO异构推理加速，无需GPU也能实时检测

我在Intel i7-13700上实测，YOLO11n经过OpenVINO INT8量化后推理延迟从原始的92ms降至19ms，配合异构调度实现CPU+GPU双核并行后进一步压缩到11ms，无需独立GPU即可跑满30FPS实时检测写在前面：一个被低估的部署痛点过去两年，我在三个不同的工业视觉项目中遇到同样的困境—…...

2026/6/10 7:12:49 阅读更多 →