Question

目前我正在尝试编写一个快速Python程序，该程序读入.pcap文件并写出有关存储在那里的各种会话的数据。

我写的信息包括srcip，dstip，srcport和dstport等。

然而，即使对于相当小的pcap，这也需要大量内存并且最终会运行很长时间。我们说的是8GB +的内存用于大小为212MB的pcap。

像往常一样，我想可能有一种更有效的方法，我只是不知道。

这是我的代码的快速框架 - 没有重要部分缺失。

import socket
from scapy.all import *


edges_file = "edges.csv"
pcap_file = "tcpdump.pcap"

try:
    print '[+] Reading and parsing pcap file: %s' % pcap_file
    a = rdpcap(pcap_file)

except Exception as e:
    print 'Something went wrong while opening/reading the pcap file.' \
          '\n\nThe error message is: %s' % e
    exit(0)

sessions = a.sessions()

print '[+] Writing to edges.csv'
f1 = open(edges_file, 'w')
f1.write('source,target,protocol,sourceport,destinationport,'
         'num_of_packets\n')
for k, v in sessions.iteritems():

    tot_packets = len(v)

    if "UDP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "TCP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "ICMP" in k:
        continue  # Not bothered about ICMP right now

    else:
        continue  # Or any other 'weird' pacakges for that matter ;)

print '[+] Closing the edges file'
f1.close()

一如既往 - 感谢任何帮助。

Answer 1

我知道我参加晚会很晚，但是希望这对将来的访客有用。

rdpcap()剖析了整个pcap文件和retains an in-memory representation of each and every packet，这说明了为什么它会占用大量内存。

据我所知（我自己是Scapy的新手），调用Scapy的会话重组的唯一两种方法是：

通过致电scapy.plist.PacketList.sessions()。这就是您当前正在执行的操作（rdpcap(pcap_file)返回scapy.plist.PacketList）。
通过在离线模式while also providing the function with a session decoder implementation中使用sniff()阅读pcap。例如，对于TCP重组，您将执行sniff(offline='stackoverflow.pcap', session=TCPSession)。（这是Scapy 2.4.3中添加的。）

选项1显然是一个死胡同（因为它要求我们一次将所有会话的所有数据包都保留在内存中），所以让我们探讨选项2 ...

让我们以交互方式启动Scapy，以访问sniff()的文档：

$ scapy
>>> help(sniff)

Help on function sniff in module scapy.sendrecv:

sniff(*args, **kwargs)
    Sniff packets and return a list of packets.
    
    Args:
        count: number of packets to capture. 0 means infinity.
        store: whether to store sniffed packets or discard them
        prn: function to apply to each packet. If something is returned, it
             is displayed.
             --Ex: prn = lambda x: x.summary()
        session: a session = a flow decoder used to handle stream of packets.
                 e.g: IPSession (to defragment on-the-flow) or NetflowSession
        filter: BPF filter to apply.
        lfilter: Python function applied to each packet to determine if
                 further action may be done.
                 --Ex: lfilter = lambda x: x.haslayer(Padding)
        offline: PCAP file (or list of PCAP files) to read packets from,
                 instead of sniffing them
        timeout: stop sniffing after a given time (default: None).
        L2socket: use the provided L2socket (default: use conf.L2listen).
        opened_socket: provide an object (or a list of objects) ready to use
                      .recv() on.
        stop_filter: Python function applied to each packet to determine if
                     we have to stop the capture after this packet.
                     --Ex: stop_filter = lambda x: x.haslayer(TCP)
        iface: interface or list of interfaces (default: None for sniffing
               on all interfaces).
        monitor: use monitor mode. May not be available on all OS
        started_callback: called as soon as the sniffer starts sniffing
                          (default: None).
    
    The iface, offline and opened_socket parameters can be either an
    element, a list of elements, or a dict object mapping an element to a
    label (see examples below).

注意store参数。我们可以将其设置为False，以使sniff()以流方式运行（读取单个数据包，对其进行处理，然后将其从内存中释放）：

sniff(offline='stackoverflow.pcap', session=TCPSession, store=False)

我刚刚用193 MB pcap进行了测试。对于store=True（默认值），这会占用我的系统（macOS）上约1.7 GB的内存，但在store=False时仅消耗约47 MB。

处理重新组合的TCP会话（未解决的问题）

因此，我们设法减少了内存占用-太好了！但是，我们如何处理（据说）重新组合的TCP会话？ The usage instructions表示我们应该使用prn的{{1}}参数来指定一个回调函数，然后该回调函数将与重新组合的TCP会话（重点是我的）一起被调用：

sniff()还提供了Sessions，可以剖析数据包无缝。例如，您可能希望您的sniff() 执行功能自动对IP数据包进行碎片整理，然后再执行 sniff(prn=...) 。

该示例是在IP分段的上下文中进行的，但是我希望TCP类似物能够对会话的所有数据包进行分组，然后为每个会话调用一次prn。不幸的是，这不是它的工作方式：我在示例pcap上尝试了此操作，并且每个数据包均被调用了一次回调-完全如上述prn的文档中所示。

上面链接的使用说明还说明了有关在sniff()中使用session=TCPSession的以下内容：

TCPSession->对某些TCP协议进行碎片整理*。当前只有HTTP 1.0使用此功能。

考虑到上面实验的输出，我现在将其解释为，每当Scapy找到跨越多个TCP段的HTTP（1.0）请求/响应时，它将创建一个有效载荷为这些TCP段的合并有效负载（总共是完整的HTTP请求/响应）。如果有人可以帮助澄清以上在TCPSession上的引用，我将不胜感激-甚至更好：澄清是否确实可以这种方式进行TCP重组，而我只是误解了API。

使用Scapy从pcap文件中读取会话可以提高内存效率

1 个答案: