如何从下面描述的日志文件中获取特定内容?

时间:2018-08-03 13:35:30

标签: regex python-3.x list

我有一个由nmap生成的日志文件,它是这样的:

Nmap scan report for gateway (10.0.0.1)
Host is up (0.0060s latency).
MAC Address: 10:BE:F5:FC:9C:65 (D-Link International)
Nmap scan report for 10.0.0.2
Host is up (0.055s latency).
MAC Address: 7C:78:7E:E8:1C:2A (Samsung Electronics)
Nmap scan report for 10.0.0.3
Host is up (0.059s latency).
MAC Address: 54:60:09:83:6E:B6 (Google)
Nmap scan report for 10.0.0.200
Host is up (-0.093s latency).
MAC Address: 5C:B9:01:02:5F:D8 (Hewlett Packard)
Nmap scan report for manoj-notebook (10.0.0.4)
Host is up.
Nmap done: 256 IP addresses (5 hosts up) scanned in 16.84 seconds

随着新设备连接到网络或现有设备与网络断开连接,它会不断变化。我想获取一个IP地址示例:10.0.0.1,mac地址示例:10:BE:F5:FC:9C:65和设备名称示例:D-Link International在单个列表中,例如:

result = [['10.0.0.1', '10.0.0.2', '10.0.0.3', '10.0.0.200', '10.0.0.4'], ['10:BE:F5:FC:9C:65', '7C:78:7E:E8:1C:2A', '54:60:09:83:6E:B6', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Samsung Electronics', 'Google', 'Hewlett Packard']] 

我尝试使用以下正则表达式来匹配IP地址,MAC地址和设备名称:

ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern = re.findall(r'(?:.*?s: ){2}(.*)(?= \))', temp)
devicePattern = re.findall(r'(?:.*?\(){2}(.*)(?=\))', temp)

我可以匹配IP地址,但不能匹配mac地址和设备名称。如何匹配它们并将其存储在单个列表中?谢谢。

此外,如果我能从日志文件示例中获取一种模式来获取延迟:0.0060s,那将是最重要的事情。谢谢。

1 个答案:

答案 0 :(得分:1)

您可以使用以下表达式:

  • ipPattern \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
  • macPattern (?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b

    • (?:[0-9A-F]{2}:)+非捕获组,其后是:的字母数字值对。
    • [0-9A-F]+\b最后一对字母数字值,后跟单词边界。
  • devicePattern (?<=\()[^)0-9.]*(?=\))

    • (?<=\()在括号)中向后寻找负数。
    • [^)0-9.]*否定的字符集,匹配所有非).或数字的字符。
    • (?=\))正向)前进。
  • 等待时间-?\d+\.\d+s(?=\slatency)

    • -?\d+\.\d+s可选地匹配-,数字,句号,更多数字和s
    • (?=\slatency)正向前进,断言空格和latency之后。

Python代码段:

import re
import itertools


temp = """
b'\nStarting Nmap 7.60 ( https://nmap.org ) at 2018-08-03 19:44 IST\nNmap scan report for gateway (10.0.0.1)\nHost is up (0.0070s latency).\nMAC Address: 10:BE:F5:FC:9C:65 (D-Link International)\nNmap scan report for 10.0.0.3\nHost is up (0.11s latency).\nMAC Address: 54:60:09:83:6E:B6 (Google)\nNmap scan report for 10.0.0.5\nHost is up (0.11s latency).\nMAC Address: 7C:78:7E:A4:73:8C (Samsung Electronics)\nNmap scan report for 10.0.0.200\nHost is up (0.027s latency).\nMAC Address: 5C:B9:01:02:5F:D8
"""

ipPattern = re.findall(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', temp)
macPattern= re.findall(r'(?:[0-9A-F]{2}:){2,}[0-9A-F]{2}\b',temp)
devicePattern = re.findall(r'(?<=\()[^)0-9.]*(?=\))',temp)
latency = re.findall(r'-?\d+\.\d+s(?=\slatency)',temp)

print(ipPattern)
print(macPattern)
print(devicePattern)
print(latency)

打印:

['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200']
['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8']
['D-Link International', 'Google', 'Samsung Electronics']
['0.0070s', '0.11s', '0.11s', '0.027s']

要加入单个列表,请使用:

mylist = itertools.chain([ipPattern], [macPattern], [devicePattern], [latency])
print(list(mylist))

打印:

[['10.0.0.1', '10.0.0.3', '10.0.0.5', '10.0.0.200'], ['10:BE:F5:FC:9C:65', '54:60:09:83:6E:B6', '7C:78:7E:A4:73:8C', '5C:B9:01:02:5F:D8'], ['D-Link International', 'Google', 'Samsung Electronics'], ['0.0070s', '0.11s', '0.11s', '0.027s']]