Question

我正在尝试打开一个文本文件。解析特定正则表达式模式的文本文件，然后当我找到该模式时，将正则表达式返回的模式写入另一个文本文件。

特别是我要解析特定内容的IP地址列表。

所以文件可能有

10.10.10.10
9.9.9.9
5.5.5.5
6.10.10.10

并说我只想要以10结尾的IP（正则表达式，我认为我很擅长）我的例子是寻找10.180.42，o4 41.XX IP主机。但我会根据需要进行调整。

我已经尝试了几种方法，并且惨遭失败。这样的日子我知道为什么我从来没有掌握过任何语言。但我已经致力于Python，所以这里就是。

import re

textfile = open("SymantecServers.txt", 'r')

matches = re.findall('^10.180\.4[3,1].\d\d',str(textfile))
print(matches)

这给了我空的支持。我不得不在str函数中包含文本文件，或者只是puked。我不知道这是不对的。

无论我如何调整它，这都失败了。

f = open("SymantecServers.txt","r")
o = open("JustIP.txt",'w', newline="\r\n")
for line in f:
    pattern = re.compile("^10.180\.4[3,1].\d\d")
    print(pattern)
    #o.write(pattern)
    #o.close()
   f.close()

我确实得到了一个工作，但它只返回整行（包括网络掩码和其他测试，如主机名，它们都在文本文件的同一行。我只想要IP）

有关如何读取文本文件的任何帮助，如果它具有IP模式，请抓取完整的IP并将其写入另一个文本文件，这样我最终得到的文本文件只包含我想要的IP列表。我工作了3个小时，后面工作，所以要手工制作第一个文件......

我不知道我错过了什么。对不起是一个新手

Answer 1

这是工作：

>>> s = """10.10.10.10
... 9.9.9.9
... 5.5.5.5
... 10.180.43.99
... 6.10.10.10"""
>>> re.findall(r'10\.180\.4[31]\.\d\d', s)
['10.180.43.99']

你真的不需要添加行边界，因为你匹配一个非常具体的IP地址，如果你的文件没有你不想匹配的'123.23.234.10.180.43.99.21354'等奇怪的东西，它应该是OK！
[3,1]的语法与3，1或,匹配，您不希望与逗号匹配; - ）

关于你的功能：

r = re.compile(r'10\.180\.4[31]\.\d\d')
with open("SymantecServers.txt","r") as f:
    with open("JustIP.txt",'w', newline="\r\n") as o:
        for line in f:
            matches = r.findall(line)
            for match in matches:
                o.write(match)

虽然如果我是你，我会使用以下方法提取IP：

r = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
with open("SymantecServers.txt","r") as f:
    with open("JustIP.txt",'w', newline="\r\n") as o:
        for line in f:
            matches = r.findall(line)
            for match in matches:
                a, b, c, d = match.split('.')
                if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
                    o.write(match)

或其他方式：

r = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})')
with open("SymantecServers.txt","r") as f:
    with open("JustIP.txt",'w', newline="\r\n") as o:
        for line in f:
            m = r.match(line)
            if m:
                a, b, c, d = m.groups()
                if int(a) < 255 and int(b) < 255 and int(c) in (43, 41) and int(d) < 100:
                    o.write(match)

使用正则表达式将IP地址拆分为组。

Answer 2

您缺少的是您正在使用re.compile()来创建Python中的正则表达式对象。你忘记了匹配。

你可以尝试：

# This isn't the best way to match IP's, but if it fits for your use-case keep it for now.
pattern = re.compile("^10.180\.4[13].\d\d")

f = open("SymantecServers.txt",'r')
o = open("JustIP.txt",'w')

for line in f:
     m = pattern.match(line)

     if m is not None:
          print "Match: %s" %(m.group(0))
          o.write(m.group(0) + "\n")

f.close()
o.close()

编译Python对象，尝试将该行与编译对象进行匹配，然后打印出当前匹配。我可以避免分割我的比赛，但我必须注意匹配的组 - 因此group(0)

您还可以查看re.search()您可以执行的操作，但如果您使用相同的正则表达式运行search足够多次，则使用compile会更有价值。

另请注意，我将f.close（）移动到for循环的外部。

解析模式的文本文件并将找到的模式写回另一个文件python 3.4

2 个答案: