Question

我有这个字符串“IP 1.2.3.4当前在白名单中受信任，但它现在在日志文件中使用新的可信证书。”。我需要做的是查找此消息并从日志文件中提取IP地址（1.2.3.4）。

import os
import shutil
import optparse
import sys

def main():
    file = open("messages", "r")
    log_data = file.read()
    file.close()

    search_str = "is currently trusted in the white list, but it is now using a new trusted certificate."

    index = log_data.find(search_str)
    print index

    return

if __name__ == '__main__':
    main()

如何提取IP地址？感谢您的回复。

Answer 1

答案非常简单：

msg = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."

parts = msg.split(' ', 2)

print parts[1]

结果：

1.2.3.4

如果你愿意，你也可以做RE，但对于这个简单的事情......

Answer 2

将有许多可能的方法，优点和缺点取决于您的日志文件的详细信息。一个例子，使用re module：

import re
x = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
pattern = "IP ([0-9\.]+) is currently trusted in the white list"
m = re.match(pattern, x)
for ip in m.groups():
    print ip

如果要在日志文件中打印出该字符串的每个实例，您可以执行以下操作：

import re
pattern = "(IP [9-0\.]+ is currently trusted in the white list, but it is now using a new trusted certificate.)"
m = re.match(pattern, log_data)
for match in m.groups():
    print match

Answer 3

使用正则表达式。

这样的代码：

import re

compiled = re.compile(r"""
    .*?                                # Leading junk
    (?P<ipaddress>\d+\.\d+\.\d+\.\d+)  # IP address
    .*?                                # Trailing junk
    """, re.VERBOSE)
str = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
m = compiled.match(str)
print m.group("ipaddress")

你明白了：

>>> import re
>>> 
>>> compiled = re.compile(r"""
...     .*?                                # Leading junk
...     (?P<ipaddress>\d+\.\d+\.\d+\.\d+)  # IP address
...     .*?                                # Trailing junk
...     """, re.VERBOSE)
>>> str = "IP 1.2.3.4 is currently trusted in the white list, but it is now using a new trusted certificate."
>>> m = compiled.match(str)
>>> print m.group("ipaddress")
1.2.3.4

另外，我在那里学到了一个匹配词典，groupdict（）：

>>>> str = "Peer 10.11.6.224 is currently trusted in the white list, but it is now using a new trusted certificate. Consider removing its likely outdated white list entry."
>>>> m = compiled.match(str)
>>>> print m.groupdict()
{'ipaddress': '10.11.6.224'}

后来：修好了。最初的'。*'正在吃你的第一个角色匹配。改变它是非贪婪的。为了保持一致性（但不是必要性），我也改变了尾随匹配。

Answer 4

正则表达是要走的路。但是如果你不舒服地写它们，你可以试一下我写的小解析器（https://github.com/hgrecco/stringparser）。它将字符串格式转换为正则表达式。在您的情况下，您将执行以下操作：

from stringparser import Parser

parser = Parser("IP {} is currently trusted in the white list, but it is now using a new trusted certificate.")

ip = parser(text)

如果您有一个包含多行的文件，则可以将最后一行替换为：

with open("log.txt", "r") as fp:
    ips = [parser(line) for line in fp]

祝你好运。

如何从Python中的文本中提取单词

4 个答案: