AI编程方法案例：绘制观影类别的堆叠面积图

张

张建站

2026/6/9 3:51:16

10分钟阅读

从教科书上看到有学者统计了Movielens数据集的观众观影类别统计图如这是利用Movielens 100K的标准数据集来绘制的。下面使用AI来帮我实现下。一个最为简单的直接方法就是把图片直接附加为附件提问为MovieLens数据集上的用户兴趣呈现出演化的特征如从1997年9月到1998年4月共8个月期间内用户观看不同电影类型的比例一直在发生变化请使用Python的matplotlib可视化方法展示类似的效果生成的代码比较长而且使用了numpy但是可以运行import matplotlib.pyplot as plt import numpy as np # 定义月份和电影类型 months [Oct, Nov, Dec, Jan, Feb, Mar, Apr, May] genres [Thriller, Sci-Fi, Romance, Horror, Drama, Crime, Comedy, Adventure, Action] # 根据图片内容整理的数据百分比缺失的Action补0或其他合理值 # 注意原始数据中有些格子是“-10%”或“10%”等这里将其理解为相对于某个基准的变化 # 为了演示我提取了实际显示的比例值根据图片中的百分比数字 data { Oct: [45, 10, 10, 10, 10, 10, 10, 10, 0], # Action没有明确数值暂且为0 Nov: [30, 10, 10, 10, 15, 10, 10, 10, 0], Dec: [15, 10, 10, 10, 10, 10, 10, 10, 0], Jan: [20, 10, 10, 10, 15, 15, 10, 10, 0], Feb: [20, 10, 10, 10, 10, 10, 10, 10, 0], Mar: [20, 10, 10, 10, 10, 10, 10, 10, 0], # 原图中Mar Drama列显示“-10%”不好直接推断先按10处理 Apr: [20, 10, 10, 10, 10, 10, 10, 10, 0], # Apr Crime显示“10%”暂按10 May: [10, 10, 10, 10, 10, 10, 10, 10, 0], # May Drama显示“±10%”暂按10 } # 转换为numpy数组以便计算 values np.array([data[m] for m in months]) # 由于Action列全为0可以删除或保留这里保留以显示完整性 # 确保每行总和为100检查和调整 for i, row in enumerate(values): total np.sum(row) if total ! 100: # 调整最大的一项使总和为100这里简化处理仅用于演示 diff 100 - total max_idx np.argmax(row) row[max_idx] diff # 绘图 fig, ax plt.subplots(figsize(12, 6)) # 使用stackplot绘制堆叠面积图 ax.stackplot(months, values.T, labelsgenres, alpha0.8, colorsplt.cm.tab20.colors) # 使用tab20颜色集 # 设置图表属性 ax.set_xlabel(Month (1997-1998), fontsize12) ax.set_ylabel(Percentage of Movies Watched (%), fontsize12) ax.set_title(Evolution of User Genre Preferences Over Months, fontsize14, fontweightbold) ax.legend(locupper left, bbox_to_anchor(1, 1), fontsize10) ax.grid(axisy, linestyle--, alpha0.7) # 设置x轴刻度 ax.set_xticks(range(len(months))) ax.set_xticklabels(months) plt.tight_layout() plt.show()运行结果为但是这个代码并不正确因为它使用的是模拟数据。因此就需要进一步结合Movielens数据集来融合代码。这一次我先尝试先把Movielens数据准备好。由于需要统计每种电影类型的被观看其实就是评价次数因此需要将评分数据u.data和电影数据u.item连接起来import pandas as pd import matplotlib.pyplot as plt ratings pd.read_csv(MovieLens/u.data, sep\t, headerNone, names[uid, mid, rating, timestamp]) ratings[time] pd.to_datetime(ratings[timestamp], units) mnames [mid, title, date1, date2, url, unknown, Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western] movies pd.read_csv(MovieLens/u.item, sep|, encodingISO-8859-1, namesmnames) result pd.merge(ratings, movies) print(result)这也是编程常见的经验一步一步来。不过这个数据还不完整足够因为最终希望以年月来汇总而目前只有评价的年月日信息。可以利用AI来做但是如果自己了解pandas会很简单的引导AI使用to_period函数进行时间粒度映射增加的代码为ratings[time] ratings[time].dt.to_period(freqM)完整代码为import pandas as pd import matplotlib.pyplot as plt ratings pd.read_csv(MovieLens/u.data, sep\t, headerNone, names[uid, mid, rating, timestamp]) ratings[time] pd.to_datetime(ratings[timestamp], units) ratings[time] ratings[time].dt.to_period(freqM) mnames [mid, title, date1, date2, url, unknown, Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western] movies pd.read_csv(MovieLens/u.item, sep|, encodingISO-8859-1, namesmnames) result pd.merge(ratings, movies) print(result)此时就需要统计各类电影类别在不同年月时间段内的出现频次。事实上我尝试过很多AI工具对于此类功能多数实现非常复杂。如果自己了解pandas会很简单的引导AI使用agg函数一句话实现增加的代码为resultresult[[time, Action, Adventure, Animation, Children, Comedy, Crime]].groupby(time).agg({Action: count, Adventure: count, Animation: count, Children: count, Comedy: count})完整代码为import pandas as pd import matplotlib.pyplot as plt ratings pd.read_csv(MovieLens/u.data, sep\t, headerNone, names[uid, mid, rating, timestamp]) ratings[time] pd.to_datetime(ratings[timestamp], units) ratings[time] ratings[time].dt.to_period(freqM) mnames [mid, title, date1, date2, url, unknown, Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western] movies pd.read_csv(MovieLens/u.item, sep|, encodingISO-8859-1, namesmnames) result pd.merge(ratings, movies) result result[[time, Action, Adventure, Animation, Children, Comedy, Crime]].groupby(time).agg({Action: sum, Adventure: sum, Animation: sum, Children: sum, Comedy: sum, Crime: sum}) print(result)输出为Action Adventure Animation Children Comedy Crimetime1997-09-01 1892 1031 297 530 2091 5901997-10-01 2560 1461 431 825 3276 8181997-11-01 6053 3378 839 1611 7188 19121997-12-01 3174 1712 425 855 3471 9561998-01-01 3740 1981 527 1049 4228 11001998-02-01 2723 1367 377 789 3238 9201998-03-01 3088 1607 397 856 3577 9861998-04-01 2359 1216 312 667 2763 773已经看到明显的处理结果。参考AI给出的numpy版本堆叠面积图可以直接使用stackplot或者引导AI使用stackplot对result结果绘制堆叠面积图提示词就可以为使用stackplot对result结果绘制堆叠面积图生成的代码为plt.stackplot(result.index, result.values.T)plt.show()完整代码为import pandas as pd import matplotlib.pyplot as plt ratings pd.read_csv(MovieLens/u.data, sep\t, headerNone, names[uid, mid, rating, timestamp]) ratings[time] pd.to_datetime(ratings[timestamp], units) ratings[time] ratings[time].dt.to_period(freqM) mnames [mid, title, date1, date2, url, unknown, Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western] movies pd.read_csv(MovieLens/u.item, sep|, encodingISO-8859-1, namesmnames) result pd.merge(ratings, movies) # 统计各类电影类别在不同年月时间段内的出现频次 result result[[time, Action, Adventure, Animation, Children, Comedy, Crime]].groupby(time).agg({Action: sum, Adventure: sum, Animation: sum, Children: sum, Comedy: sum, Crime: sum}) #使用stackplot对result结果绘制堆叠面积图 plt.stackplot(result.index, result.values.T) plt.show()但是运行会引发错误TypeError: ufunc isfinite not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule safe将错误信息直接丢给AI可以了解result.index 是 Period 类型因为第 6 行使用了 to_period(freqM)而 matplotlib 的 stackplot 函数无法直接处理 Period 类型的数据进行数值计算导致出现 ufunc isfinite not supported for the input types 错误。AI也可以给出具体修改建议补充的代码为result.index result.index.to_timestamp()完整代码为import pandas as pd import matplotlib.pyplot as plt ratings pd.read_csv(MovieLens/u.data, sep\t, headerNone, names[uid, mid, rating, timestamp]) ratings[time] pd.to_datetime(ratings[timestamp], units) ratings[time] ratings[time].dt.to_period(freqM) mnames [mid, title, date1, date2, url, unknown, Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western] movies pd.read_csv(MovieLens/u.item, sep|, encodingISO-8859-1, namesmnames) result pd.merge(ratings, movies) # 统计各类电影类别在不同年月时间段内的出现频次 result result[[time, Action, Adventure, Animation, Children, Comedy, Crime]].groupby(time).agg({Action: sum, Adventure: sum, Animation: sum, Children: sum, Comedy: sum, Crime: sum}) result.index result.index.to_timestamp() #使用stackplot对result结果绘制堆叠面积图 plt.stackplot(result.index, result.values.T) plt.show()运行结果为这个图样猛地看来正确其实和要求的并不一样主要表现为纵轴是绝对值而非相对百分比。可以进一步引导AI提问为将每行各个单元格的数值分别除以当前行的总和得到各自的百分比值得到的代码为result result.apply(lambda x: x / x.sum(), axis1)完整代码为import pandas as pd import matplotlib.pyplot as plt ratings pd.read_csv(MovieLens/u.data, sep\t, headerNone, names[uid, mid, rating, timestamp]) ratings[time] pd.to_datetime(ratings[timestamp], units) ratings[time] ratings[time].dt.to_period(freqM) mnames [mid, title, date1, date2, url, unknown, Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western] movies pd.read_csv(MovieLens/u.item, sep|, encodingISO-8859-1, namesmnames) result pd.merge(ratings, movies) # 统计各类电影类别在不同年月时间段内的出现频次 result result[[time, Action, Adventure, Animation, Children, Comedy, Crime]].groupby(time).agg({Action: sum, Adventure: sum, Animation: sum, Children: sum, Comedy: sum, Crime: sum}) result.index result.index.to_timestamp() # 将每行各个单元格的数值分别除以当前行的总和得到各自的百分比值 result result.apply(lambda x: x / x.sum(), axis1) # #使用stackplot对result结果绘制堆叠面积图 plt.stackplot(result.index, result.values.T) plt.show()输出为

vRealize Operations Manager 8.x 巡检报告配置全攻略：从手动生成到自动邮件推送

vRealize Operations Manager 8.x 自动化巡检报告实战指南在虚拟化运维领域，定期生成系统健康状态报告是保障业务连续性的基础工作。传统手工操作不仅耗时耗力，还容易因人为疏忽导致关键指标遗漏。vRealize Operations Manager 8.x（简称vROps…...

2026/6/9 3:49:40 阅读更多 →

Rxjava 内容

导言：响应式编程到底是什么？想象你在开发一个外卖App，用户打开首页，后端需要同时从多个服务聚合数据：餐厅列表、用户优惠券、会员权益、附近配送员信息。如果用传统阻塞IO，这些请求串行执行，总耗时是各服务响应时间之和。更糟的是，在等待数据库查询或网络请求时，线程…...

2026/6/9 3:49:34 阅读更多 →

边缘计算与 CDN 动态回源：Serverless 进阶架构，从静态缓存到智能分发

边缘计算与 CDN 动态回源：Serverless 进阶架构，从静态缓存到智能分发一、传统 CDN 的动态内容瓶颈：缓存命中率低下的代价 CDN 的核心价值在于将内容缓存到离用户最近的边缘节点，减少回源延迟。但对于动态内容（如个性化…...

2026/6/9 3:44:24 阅读更多 →

CSDN AI数字营销卡片配置手册（跳转权限解禁版）：官方未公开的3种合规跳转变通方案

更多请点击： https://codechina.net 第一章：CSDN AI 数字营销的引流卡片支持跳转官网、小程序链接吗？ CSDN AI 数字营销平台提供的引流卡片，是面向技术创作者与企业用户的核心转化组件，其核心能力之一即为外链跳转。目…...

2026/6/8 4:16:56 阅读更多 →

如何3分钟找回遗忘的压缩包密码：免费开源工具的终极指南

如何3分钟找回遗忘的压缩包密码：免费开源工具的终极指南【免费下载链接】ArchivePasswordTestTool 利用7zip测试压缩包的功能对加密压缩包进行自动化测试密码项目地址: https://gitcode.com/gh_mirrors/ar/ArchivePasswordTestTool 你是否曾经面对一个加密…...

2026/6/8 18:53:33 阅读更多 →

Linux桌面便签神器：Sticky如何让你的工作效率提升300%？

Linux桌面便签神器：Sticky如何让你的工作效率提升300%？ 【免费下载链接】sticky A sticky notes app for the linux desktop 项目地址: https://gitcode.com/gh_mirrors/stic/sticky 在Linux桌面上，你是否经常需要快速记录一闪而过的灵…...

2026/6/8 5:33:48 阅读更多 →

YOLO11部署优化：OpenVINO推理 | 在Intel CPU上利用OpenVINO异构推理加速，无需GPU也能实时检测

我在Intel i7-13700上实测，YOLO11n经过OpenVINO INT8量化后推理延迟从原始的92ms降至19ms，配合异构调度实现CPU+GPU双核并行后进一步压缩到11ms，无需独立GPU即可跑满30FPS实时检测写在前面：一个被低估的部署痛点过去两年，我在三个不同的工业视觉项目中遇到同样的困境—…...

2026/6/9 2:16:14 阅读更多 →