告别玄学：手把手教你用Python复现HOLMES论文中的可疑信息流关联算法

张

张建站

2026/6/7 10:47:05

10分钟阅读

用Python实战复现HOLMES论文中的可疑信息流关联算法在安全分析领域APT攻击检测一直是个令人头疼的难题。传统的检测方法往往难以捕捉那些潜伏期长、隐蔽性高的高级威胁。2019年SP顶会论文《HOLMES: Real-time APT Detection through Correlation of Suspicious Information Flows》提出了一种创新性的解决方案——通过关联可疑信息流来实时检测APT攻击。本文将带你用Python一步步复现这个算法的核心部分让你不仅能理解论文的精髓还能亲手实现它。1. 环境准备与数据模拟1.1 安装必要依赖首先确保你的Python环境已经安装了以下关键库pip install networkx pandas numpy matplotlib pyyaml这些库将分别用于networkx构建和操作溯源图pandas数据处理和分析numpy数值计算matplotlib结果可视化pyyaml读取配置文件1.2 模拟审计日志数据由于真实APT攻击数据难以获取我们需要模拟一个接近真实场景的数据集。HOLMES论文中使用了DARPA TC数据集我们可以创建一个简化版的模拟数据import pandas as pd import numpy as np def generate_synthetic_logs(num_entries1000): processes [explorer.exe, chrome.exe, svchost.exe, powershell.exe, cmd.exe] objects [C:\\temp\\1.tmp, C:\\Windows\\system32\\kernel32.dll, registry\\HKLM\\SOFTWARE, 192.168.1.1:443, /tmp/.hidden] operations [read, write, execute, connect, create] data [] for _ in range(num_entries): timestamp pd.Timestamp.now() - pd.Timedelta(minutesnp.random.randint(0, 1440)) src np.random.choice(processes) dst np.random.choice(objects) op np.random.choice(operations) data.append([timestamp, src, dst, op]) return pd.DataFrame(data, columns[timestamp, source, destination, operation]) logs generate_synthetic_logs()这个模拟数据包含了时间戳、源进程、目标对象和操作类型基本覆盖了真实审计日志的关键字段。2. 构建溯源图2.1 从日志到基础溯源图溯源图(Provenance Graph)是HOLMES算法的核心数据结构它记录了系统中各实体间的信息流动关系。让我们用networkx构建基础溯源图import networkx as nx from collections import defaultdict def build_provenance_graph(logs): G nx.DiGraph() entity_counters defaultdict(int) for _, row in logs.iterrows(): src fprocess:{row[source]} dst fobject:{row[destination]} # 添加节点和边 G.add_node(src, typeprocess) G.add_node(dst, typeobject) G.add_edge(src, dst, operationrow[operation], timestamprow[timestamp]) return G provenance_graph build_provenance_graph(logs)2.2 增强溯源图语义原始论文中溯源图节点被分为多种类型并附加了丰富的语义信息。我们来增强图的表达能力def enhance_provenance_graph(G): # 添加节点属性 for node in G.nodes(): if process in node: G.nodes[node][trust_level] np.random.choice([high, medium, low]) G.nodes[node][privilege] np.random.choice([user, admin, system]) else: G.nodes[node][sensitivity] np.random.choice([public, private, confidential]) # 添加边权重 for src, dst in G.edges(): op G.edges[src, dst][operation] weight 1.0 if op execute: weight 1.5 elif op write: weight 1.2 G.edges[src, dst][weight] weight return G enhanced_graph enhance_provenance_graph(provenance_graph)现在我们的溯源图已经包含了进程节点的信任级别和权限等级对象节点的敏感程度边上的操作类型和权重3. 实现可疑信息流关联3.1 定义可疑模式HOLMES论文中定义了多种可疑模式我们实现其中三种典型模式def detect_suspicious_patterns(G): suspicious_subgraphs [] # 模式1低信任进程写入高敏感对象 for src, dst in G.edges(): if (process in src and object in dst and G.nodes[src][trust_level] low and G.nodes[dst][sensitivity] confidential and G.edges[src, dst][operation] write): suspicious_subgraphs.append((src, dst)) # 模式2进程链式执行 process_exec_chains [] for node in G.nodes(): if process in node: predecessors list(G.predecessors(node)) if any(process in pred for pred in predecessors): process_exec_chains.append((predecessors[0], node)) # 模式3网络连接后文件操作 network_file_sequences [] for node in G.nodes(): if object in node and :443 in node: # 假设443端口连接是可疑的 successors list(G.successors(node)) if any(object in succ and C:\\ in succ for succ in successors): network_file_sequences.append((node, successors[0])) return { low_trust_write: suspicious_subgraphs, process_chains: process_exec_chains, network_file_seq: network_file_sequences } suspicious_patterns detect_suspicious_patterns(enhanced_graph)3.2 构建高级场景图(HSG)HOLMES的创新点之一是引入了高级场景图(High-level Scenario Graph)来弥合底层日志和高层攻击意图之间的语义鸿沟。我们来实现一个简化版的HSG构建def build_hsg(suspicious_patterns, G): hsg nx.DiGraph() kill_chain_phases [Reconnaissance, Weaponization, Delivery, Exploitation, Installation, C2, Actions] # 添加杀伤链节点 for phase in kill_chain_phases: hsg.add_node(phase, typephase) # 映射可疑模式到杀伤链阶段 for pattern_type, patterns in suspicious_patterns.items(): for src, dst in patterns: if pattern_type low_trust_write: hsg.add_edge(Exploitation, Installation, examplef{src} - {dst}, confidence0.7) elif pattern_type process_chains: hsg.add_edge(Installation, C2, examplef{src} - {dst}, confidence0.8) elif pattern_type network_file_seq: hsg.add_edge(C2, Actions, examplef{src} - {dst}, confidence0.9) return hsg hsg build_hsg(suspicious_patterns, enhanced_graph)4. 结果可视化与分析4.1 可视化溯源图让我们用matplotlib可视化部分溯源图import matplotlib.pyplot as plt def visualize_graph(G, title): plt.figure(figsize(12, 8)) pos nx.spring_layout(G) node_colors [] for node in G.nodes(): if process in node: node_colors.append(lightblue) else: node_colors.append(lightgreen) edge_colors [] edge_widths [] for src, dst in G.edges(): edge_colors.append(gray) edge_widths.append(G.edges[src, dst][weight]) nx.draw(G, pos, with_labelsTrue, node_colornode_colors, edge_coloredge_colors, widthedge_widths, font_size8) plt.title(title) plt.show() # 只可视化前20个节点的小图 small_graph enhanced_graph.subgraph(list(enhanced_graph.nodes())[:20]) visualize_graph(small_graph, Provenance Graph Subset)4.2 可视化HSG高级场景图的可视化能更清晰地展示攻击模式def visualize_hsg(hsg): plt.figure(figsize(10, 6)) pos nx.spectral_layout(hsg) nx.draw(hsg, pos, with_labelsTrue, node_colorlightcoral, edge_colorgray, width1.5, font_size10) edge_labels {(u, v): fconf: {d[confidence]} for u, v, d in hsg.edges(dataTrue)} nx.draw_networkx_edge_labels(hsg, pos, edge_labelsedge_labels, font_size8) plt.title(High-level Scenario Graph (HSG)) plt.show() visualize_hsg(hsg)4.3 分析检测结果为了评估我们的实现效果可以计算一些基本指标def analyze_results(suspicious_patterns): total_patterns sum(len(v) for v in suspicious_patterns.values()) print(f检测到可疑模式总数: {total_patterns}) print(详细分类:) for pattern_type, patterns in suspicious_patterns.items(): print(f- {pattern_type}: {len(patterns)} 个实例) if total_patterns 0: print(\n示例可疑模式:) for pattern_type in suspicious_patterns: if suspicious_patterns[pattern_type]: print(f{pattern_type}: {suspicious_patterns[pattern_type][0]}) analyze_results(suspicious_patterns)5. 优化与扩展5.1 添加时间窗口分析原始论文强调了实时检测我们可以添加时间窗口分析def analyze_time_windows(G, window_minutes30): edges list(G.edges(dataTrue)) edges_sorted sorted(edges, keylambda x: x[2][timestamp]) suspicious_in_window [] for i in range(len(edges_sorted)-1): time_diff (edges_sorted[i1][2][timestamp] - edges_sorted[i][2][timestamp]).total_seconds()/60 if time_diff window_minutes: # 检查是否构成可疑模式 src1, dst1, _ edges_sorted[i] src2, dst2, _ edges_sorted[i1] if process in src1 and process in dst2: suspicious_in_window.append((src1, dst1, src2, dst2)) return suspicious_in_window time_based_patterns analyze_time_windows(enhanced_graph)5.2 引入ATTCK框架映射HOLMES论文提到了与ATTCK框架的集成我们可以实现一个简化的映射def map_to_mitre_attack(hsg): mitre_techniques { Reconnaissance: [T1595, T1592], Weaponization: [T1587, T1588], Delivery: [T1566, T1195], Exploitation: [T1210, T1211], Installation: [T1059, T1105], C2: [T1071, T1095], Actions: [T1005, T1025] } for node in hsg.nodes(): if node in mitre_techniques: hsg.nodes[node][mitre] mitre_techniques[node] return hsg hsg_with_mitre map_to_mitre_attack(hsg)5.3 性能优化建议当处理大规模日志时性能成为关键因素。以下是几种优化策略增量图构建不是每次重新构建整个图而是增量添加新日志条目图分区将大图分成若干子图并行处理近似算法对某些分析任务使用近似算法提高速度采样检测对极高流量系统可以先采样再分析def incremental_update(G, new_logs): 增量更新溯源图 for _, row in new_logs.iterrows(): src fprocess:{row[source]} dst fobject:{row[destination]} if not G.has_node(src): G.add_node(src, typeprocess, trust_levelnp.random.choice([high, medium, low]), privilegenp.random.choice([user, admin, system])) if not G.has_node(dst): G.add_node(dst, typeobject, sensitivitynp.random.choice([public, private, confidential])) G.add_edge(src, dst, operationrow[operation], timestamprow[timestamp]) return G

如何轻松解锁网易云音乐NCM格式：ncmdumpGUI完整使用教程

如何轻松解锁网易云音乐NCM格式：ncmdumpGUI完整使用教程【免费下载链接】ncmdumpGUI C#版本网易云音乐ncm文件格式转换，Windows图形界面版本项目地址: https://gitcode.com/gh_mirrors/nc/ncmdumpGUI 你是否曾经遇到过这样的情况：在…...

2026/6/7 10:45:28 阅读更多 →

如何快速配置大麦网智能抢票脚本：面向新手的完整指南

如何快速配置大麦网智能抢票脚本：面向新手的完整指南【免费下载链接】DamaiHelper 大麦网演唱会演出抢票脚本。项目地址: https://gitcode.com/gh_mirrors/dama/DamaiHelper 还在为抢不到心仪的演唱会门票而烦恼吗？每次热门演出开票时&#xff…...

2026/6/7 10:45:24 阅读更多 →

企业级物流管理系统管理系统源码｜SpringBoot+Vue+MyBatis架构+MySQL数据库【完整版】

💡实话实说： CSDN上做毕设辅导的都是专业技术服务，大家都要生活，这个很正常。我和其他人不同的是，我有自己的项目库存，不需要找别人拿货再加价，所以能给到超低价格。摘要随着全球电子商务的迅…...

2026/6/7 10:45:11 阅读更多 →

CSDN AI数字营销卡片配置手册（跳转权限解禁版）：官方未公开的3种合规跳转变通方案

更多请点击： https://codechina.net 第一章：CSDN AI 数字营销的引流卡片支持跳转官网、小程序链接吗？ CSDN AI 数字营销平台提供的引流卡片，是面向技术创作者与企业用户的核心转化组件，其核心能力之一即为外链跳转。目…...

2026/6/7 0:01:50 阅读更多 →

如何3分钟找回遗忘的压缩包密码：免费开源工具的终极指南

如何3分钟找回遗忘的压缩包密码：免费开源工具的终极指南【免费下载链接】ArchivePasswordTestTool 利用7zip测试压缩包的功能对加密压缩包进行自动化测试密码项目地址: https://gitcode.com/gh_mirrors/ar/ArchivePasswordTestTool 你是否曾经面对一个加密…...

2026/6/7 0:03:20 阅读更多 →

Linux桌面便签神器：Sticky如何让你的工作效率提升300%？

Linux桌面便签神器：Sticky如何让你的工作效率提升300%？ 【免费下载链接】sticky A sticky notes app for the linux desktop 项目地址: https://gitcode.com/gh_mirrors/stic/sticky 在Linux桌面上，你是否经常需要快速记录一闪而过的灵…...

2026/6/7 0:05:34 阅读更多 →

YOLO11部署优化：OpenVINO推理 | 在Intel CPU上利用OpenVINO异构推理加速，无需GPU也能实时检测

我在Intel i7-13700上实测，YOLO11n经过OpenVINO INT8量化后推理延迟从原始的92ms降至19ms，配合异构调度实现CPU+GPU双核并行后进一步压缩到11ms，无需独立GPU即可跑满30FPS实时检测写在前面：一个被低估的部署痛点过去两年，我在三个不同的工业视觉项目中遇到同样的困境—…...

2026/6/7 0:10:18 阅读更多 →