GLM-OCR API调用详解Python示例助你快速集成到项目1. GLM-OCR简介与核心能力GLM-OCR是一款专业级多模态OCR模型在权威文档解析基准测试OmniDocBench V1.5中以94.6分取得SOTA表现。它不仅能识别常规文本还能处理复杂的文档结构特别适合需要高精度识别的业务场景。1.1 四大核心功能文本识别支持中英文混合识别准确率高达98%公式解析可识别LaTeX格式的数学公式表格还原保持原始表格结构支持合并单元格信息抽取从文档中提取关键字段如发票金额、日期等1.2 技术优势轻量级设计单张消费级显卡即可运行RESTful API接口易于集成到现有系统平均响应时间500msA10G显卡支持PNG/JPG/JPEG/WEBP等多种图片格式2. 环境准备与API服务配置2.1 服务部署检查在开始调用API前请确保服务已正确启动# 检查服务状态 supervisorctl status # 预期输出示例 glm-ocr:glm-ocr-webui RUNNING pid 1234, uptime 0:05:23 glm-ocr:glm-ocr RUNNING pid 1235, uptime 0:05:23如果服务未运行执行以下命令启动supervisorctl restart glm-ocr:*2.2 端口确认GLM-OCR默认使用两个端口7860Web界面可选8080API服务必需确保防火墙已放行这些端口# Ubuntu示例 sudo ufw allow 8080/tcp3. Python API调用实战3.1 基础文本识别示例以下是最简单的文本识别代码import requests import base64 def ocr_basic(image_path): # 读取图片并编码 with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } response requests.post(url, jsonpayload) return response.json() # 使用示例 result ocr_basic(invoice.png) print(result[choices][0][message][content])3.2 表格识别进阶示例对于表格类文档可以指定识别模式def ocr_table(image_path): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Table Recognition:} # 关键指令 ] } ] } response requests.post(url, jsonpayload) return response.json() # 返回的表格数据通常为Markdown格式 table_result ocr_table(financial_report.png) print(table_result[choices][0][message][content])3.3 批量处理与性能优化当需要处理大量图片时建议使用异步请求import aiohttp import asyncio async def async_ocr(image_paths): async with aiohttp.ClientSession() as session: tasks [] for path in image_paths: with open(path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } tasks.append(session.post(http://localhost:8080/v1/chat/completions, jsonpayload)) responses await asyncio.gather(*tasks) return [await r.json() for r in responses] # 使用示例 image_list [doc1.png, doc2.png, doc3.png] results asyncio.run(async_ocr(image_list))4. 高级功能与实用技巧4.1 混合内容识别GLM-OCR可以同时处理包含文本、公式和表格的复杂文档def mixed_ocr(image_path): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: 识别文档中的所有内容包括文本、公式和表格} ] } ], temperature: 0.3 # 降低随机性提高识别稳定性 } response requests.post(url, jsonpayload) return response.json()4.2 结构化数据提取通过自然语言指令提取特定信息def extract_info(image_path, query): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: query} ] } ] } response requests.post(url, jsonpayload) return response.json() # 示例提取发票信息 invoice_info extract_info(invoice.jpg, 提取发票中的开票日期、金额、销售方名称) print(invoice_info[choices][0][message][content])4.3 图像预处理建议为提高识别准确率建议在调用API前进行简单预处理from PIL import Image import numpy as np def preprocess_image(image_path): img Image.open(image_path) # 自动旋转校正如有必要 if hasattr(img, _getexif): exif img._getexif() if exif: orientation exif.get(0x0112) if orientation 3: img img.rotate(180, expandTrue) elif orientation 6: img img.rotate(270, expandTrue) elif orientation 8: img img.rotate(90, expandTrue) # 对比度增强 img img.convert(L) img np.array(img) img (img - img.min()) * (255 / (img.max() - img.min())) img Image.fromarray(img.astype(uint8)) return img # 使用预处理后的图片 processed_img preprocess_image(low_contrast.jpg) processed_img.save(processed.jpg) result ocr_basic(processed.jpg)5. 错误处理与性能优化5.1 常见错误代码处理完善的API调用应包含错误处理逻辑def robust_ocr(image_path): try: with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } response requests.post(url, jsonpayload, timeout10) response.raise_for_status() result response.json() if choices not in result: raise ValueError(Invalid response format) return result[choices][0][message][content] except requests.exceptions.RequestException as e: print(f请求失败: {str(e)}) return None except Exception as e: print(f处理失败: {str(e)}) return None5.2 性能优化建议连接池复用为高频调用创建Session对象超时设置避免长时间等待结果缓存对相同图片缓存识别结果优化后的示例from functools import lru_cache import hashlib session requests.Session() def get_image_hash(image_path): with open(image_path, rb) as f: return hashlib.md5(f.read()).hexdigest() lru_cache(maxsize100) def cached_ocr(image_hash, image_path): with open(image_path, rb) as f: image_data base64.b64encode(f.read()).decode(utf-8) url http://localhost:8080/v1/chat/completions payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } try: response session.post(url, jsonpayload, timeout5) response.raise_for_status() return response.json() except requests.exceptions.RequestException: return None # 使用示例 image_hash get_image_hash(contract.pdf) result cached_ocr(image_hash, contract.pdf)6. 项目集成建议6.1 微服务架构设计建议将GLM-OCR封装为独立微服务项目架构示例 └── 您的业务系统 ├── Web前端 ├── 业务逻辑层 └── OCR服务网关 ← 调用 → GLM-OCR微服务(本机或内网)6.2 Django集成示例在Django项目中创建OCR服务模块# ocr_service.py import requests from django.conf import settings class OCRService: def __init__(self): self.api_url settings.OCR_API_URL # 配置在settings.py中 def recognize_text(self, image_file): image_data base64.b64encode(image_file.read()).decode(utf-8) payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: Text Recognition:} ] } ] } response requests.post(self.api_url, jsonpayload) return response.json() # views.py中使用示例 from .ocr_service import OCRService def process_document(request): if request.method POST: uploaded_file request.FILES[document] ocr OCRService() result ocr.recognize_text(uploaded_file) return JsonResponse(result) return HttpResponseBadRequest()6.3 Flask集成示例快速创建OCR API网关from flask import Flask, request, jsonify import requests app Flask(__name__) OCR_API http://localhost:8080/v1/chat/completions app.route(/api/ocr, methods[POST]) def ocr_proxy(): if file not in request.files: return jsonify({error: No file uploaded}), 400 file request.files[file] image_data base64.b64encode(file.read()).decode(utf-8) payload { messages: [ { role: user, content: [ {type: image, url: fdata:image/png;base64,{image_data}}, {type: text, text: request.form.get(instruction, Text Recognition:)} ] } ] } response requests.post(OCR_API, jsonpayload) return jsonify(response.json()) if __name__ __main__: app.run(port5000)7. 总结与最佳实践7.1 核心要点回顾GLM-OCR提供RESTful API接口支持文本、公式、表格的识别Python集成简单只需requests库即可调用通过调整content中的text指令可以控制识别模式生产环境建议添加错误处理、缓存和连接池优化7.2 推荐实践方案开发环境直接调用本地8080端口测试环境使用Nginx反向代理添加负载均衡生产环境部署多个OCR实例使用Kubernetes进行容器编排添加API网关进行限流和认证7.3 性能基准参考以下是在不同硬件上的性能测试数据硬件配置图片尺寸平均响应时间最大QPSRTX 3060 (12GB)1024x768620ms8T4 (16GB)1024x768480ms12A10G (24GB)1024x768350ms207.4 后续学习建议尝试结合PaddleOCR等开源工具进行结果校验探索与LangChain等框架的集成方案关注模型更新及时获取性能提升获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。