Shopee API逆向分析：如何用Python快速获取全站商品分类树（含完整代码）

张

张建站

2026/4/26 11:41:38

10分钟阅读

Shopee API逆向分析：如何用Python快速获取全站商品分类树（含完整代码）

Shopee API逆向分析Python实战全站商品分类树获取当我们需要分析电商平台商品结构时获取完整的分类体系是基础工作。本文将分享如何通过Python逆向分析Shopee的API接口高效获取包含一二级分类的商品分类树并提供可直接运行的代码实现。1. 技术准备与环境搭建在开始之前确保已安装以下Python库pip install requests pandas pyquery建议使用Python 3.7环境主要依赖库包括requests用于发送HTTP请求pandas数据处理和分析pyqueryHTML解析注意实际操作前请确认目标网站的robots.txt协议本文仅用于技术研究目的。2. API接口分析通过浏览器开发者工具分析Shopee网页请求我们发现两个关键API端点获取基础分类仅一级/api/v4/pages/get_homepage_category_list获取完整分类树含二级/api/v4/pages/get_category_tree关键响应数据结构如下{ data: { category_list: [ { catid: 11040766, parent_catid: 0, name: Womens Apparel, display_name: 女生衣著, level: 1, children: [ { catid: 11042304, parent_catid: 11040766, name: T-Shirts, display_name: T恤, level: 2 } ] } ] } }3. 分类数据获取实现以下是完整的Python实现代码import requests import pandas as pd from pyquery import PyQuery as pq class ShopeeCategoryCrawler: def __init__(self): self.base_url https://shopee.com.my self.session requests.Session() self.session.headers.update({ User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36, Referer: f{self.base_url}/ }) def get_category_tree(self): 获取完整分类树 api_url f{self.base_url}/api/v4/pages/get_category_tree try: response self.session.get(api_url, timeout10) if response.status_code 200: return response.json() raise Exception(fAPI请求失败: {response.status_code}) except Exception as e: print(f获取分类树出错: {str(e)}) return None def parse_categories(self, data): 解析分类数据 categories [] for cat in data[data][category_list]: categories.append({ level: 1, catid: cat[catid], parent_id: 0, name: cat[name], display_name: cat[display_name] }) for child in cat.get(children, []): categories.append({ level: 2, catid: child[catid], parent_id: cat[catid], name: child[name], display_name: child[display_name] }) return categories def export_to_excel(self, data, filename): 导出到Excel df pd.DataFrame(data) df.to_excel(filename, indexFalse) print(f数据已导出到 {filename}) if __name__ __main__: crawler ShopeeCategoryCrawler() tree_data crawler.get_category_tree() if tree_data: categories crawler.parse_categories(tree_data) crawler.export_to_excel(categories, shopee_categories.xlsx)4. 关键技术点解析4.1 请求头模拟Shopee的API对请求头有基本验证需要设置合理的User-Agent和Refererself.session.headers.update({ User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36, Referer: f{self.base_url}/ })4.2 异常处理机制完善的异常处理保证程序稳定性try: response self.session.get(api_url, timeout10) if response.status_code 200: return response.json() raise Exception(fAPI请求失败: {response.status_code}) except Exception as e: print(f获取分类树出错: {str(e)}) return None4.3 数据解析技巧使用递归方式处理多级分类def parse_categories(self, data): categories [] for cat in data[data][category_list]: # 一级分类处理 categories.append({...}) # 二级分类处理 for child in cat.get(children, []): categories.append({...}) return categories5. 数据应用扩展获取分类数据后可以进一步实现商品列表获取def get_items_by_category(self, cat_id, page0, limit60): api_url f{self.base_url}/api/v4/search/search_items params { by: relevancy, fe_categoryids: cat_id, limit: limit, newest: page * limit, order: desc, page_type: search, scenario: PAGE_OTHERS, version: 2 } return self.session.get(api_url, paramsparams).json()分类关系可视化import networkx as nx import matplotlib.pyplot as plt def visualize_category_tree(categories): G nx.Graph() for cat in categories: if cat[level] 1: G.add_node(cat[catid], labelcat[display_name]) else: G.add_edge(cat[parent_id], cat[catid]) nx.draw(G, with_labelsTrue) plt.show()6. 反爬应对策略在实际应用中可能会遇到反爬措施建议合理设置请求间隔使用代理IP池模拟真实用户行为模式处理Cookie和Session# 示例使用代理 proxies { http: http://your_proxy:port, https: https://your_proxy:port } response self.session.get(api_url, proxiesproxies)7. 完整项目结构建议对于生产环境应用建议采用如下项目结构shopee_crawler/ ├── core/ │ ├── crawler.py # 主爬虫逻辑 │ ├── parser.py # 数据解析 │ └── storage.py # 数据存储 ├── utils/ │ ├── proxy.py # 代理管理 │ └── logger.py # 日志记录 ├── config.py # 配置文件 └── main.py # 入口文件这种模块化设计便于功能扩展和维护。

终极Windows安卓应用安装指南：告别模拟器，3步轻松安装APK文件

终极Windows安卓应用安装指南：告别模拟器，3步轻松安装APK文件【免费下载链接】APK-Installer An Android Application Installer for Windows 项目地址: https://gitcode.com/GitHub_Trending/ap/APK-Installer 你是否曾经为了在Windows电脑上运…...

2026/4/26 11:37:45 阅读更多 →

ComfyUI-Florence2完整指南：5分钟快速上手微软最强视觉语言模型

ComfyUI-Florence2完整指南：5分钟快速上手微软最强视觉语言模型【免费下载链接】ComfyUI-Florence2 Inference Microsoft Florence2 VLM 项目地址: https://gitcode.com/gh_mirrors/co/ComfyUI-Florence2 如果你正在寻找一款能够一站式解决图像理解、文档问…...

2026/4/26 11:37:09 阅读更多 →

从原子团簇到你的代码：一文读懂Python盆地跳跃(basinhopping)算法原理与避坑指南

从原子团簇到你的代码：一文读懂Python盆地跳跃(basinhopping)算法原理与避坑指南想象你是一位在崎岖山地中寻找最低点的探险家。眼前的地形复杂多变，有无数个山谷和洼地，而你的目标是在有限的体力和时间内找到最深的那一处。这正是Python中…...

2026/4/26 11:36:01 阅读更多 →

保姆级避坑指南：用MIM搞定MMSegmentation 2.0.0安装，告别版本兼容性报错

深度学习语义分割实战：MMSegmentation 2.0极简安装与避坑手册在计算机视觉领域，语义分割技术正以惊人的速度重塑着医疗影像分析、自动驾驶和工业质检等场景的应用边界。作为OpenMMLab生态中的重要成员，MMSegmentation 2.0凭借其模块化设计和…...

2026/4/26 0:05:40 阅读更多 →

Chrome-GPT：将大语言模型深度集成到浏览器的开发实践

1. 项目概述：当浏览器插件遇上大语言模型最近在折腾一个挺有意思的开源项目，叫“Chrome-GPT”。光看名字，你大概就能猜到它的核心玩法：把当下最火的大语言模型（LLM）能力，直接集成到我们每天都要…...

2026/4/26 0:05:44 阅读更多 →

别再用Node.js写MCP网关了！C++ 2024性能基准测试：相同硬件下吞吐量超Go 3.8倍，延迟降低62%

更多请点击： https://intelliparadigm.com 第一章：MCP协议核心原理与C网关设计全景概览 MCP（Modular Communication Protocol）是一种面向微服务间低延迟、高可靠通信的二进制协议，其核心在于“模块化帧结构”与“状态…...

2026/4/26 0:05:49 阅读更多 →

终极指南：如何通过Newtonsoft.Json配置实现高性能JSON序列化

终极指南：如何通过Newtonsoft.Json配置实现高性能JSON序列化【免费下载链接】Newtonsoft.Json Json.NET is a popular high-performance JSON framework for .NET 项目地址: https://gitcode.com/gh_mirrors/ne/Newtonsoft.Json Newtonsoft.Json&#xff08…...

2026/4/26 0:07:30 阅读更多 →