从文件中提取IP地址

时间:2016-09-02 20:20:11

标签: python python-2.7 extract ipv4

我试图从Python中的asp文件中提取IP地址,该文件看起来像这样:

onInternalNet = (
        isInNet(hostDNS, "147.163.1.0", "255.255.0.0") ||
        isInNet(hostDNS, "123.264.0.0", "255.255.0.0") ||
        isInNet(hostDNS, "137.5.0.0", "255.0.0.0") ||
        isInNet(hostDNS, "100.01.02.0", "255.0.0.0") ||
        isInNet(hostDNS, "172.146.30.0", "255.240.0.0") ||
        isInNet(hostDNS, "112.268.0.0", "255.255.0.0") ||

我试图提取它们的方法是使用正则表达式:

if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):

但是我收到了错误:

Traceback (most recent call last):
  File "pull_proxy.py", line 27, in <module>
    write_to_file(extract_proxies(in_file), out_file)
  File "pull_proxy.py", line 8, in extract_proxies
    if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
  File "C:\Python27\lib\re.py", line 194, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 233, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

我不明白为什么我会收到这个错误,我该怎么做才能让它提取我想要的信息呢?

import re

def extract_proxies(in_file):
    buffer = []

    for line in in_file:
        if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
            print "{} appened to buffer.".format(line)
            buffer.append(line)
        else:
            pass

    return buffer

def write_to_file(buffer, out_file):
    for proxy in buffer:
        with open(out_file, "a+") as res:
            res.write(proxy)

if __name__ == '__main__':
    print "Running...."
    in_file = "C:/Users/thomas_j_perkins/Downloads/test.asp"
    out_file = "c:/users/thomas_j_perkins/Downloads/results.txt"
    write_to_file(extract_proxies(in_file), out_file)

修改

意识到我没有打开文件:

import re

def extract_proxies(in_file):
    buffer = []

    for line in in_file:
        if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
            print "{} appened to buffer.".format(line)
            buffer.append(line)
        else:
            pass

    in_file.close()
    return buffer

def write_to_file(buffer, out_file):
    for proxy in buffer:
        with open(out_file, "a+") as res:
            res.write(proxy)

if __name__ == '__main__':
    print "Running...."
    in_file = "C:/Users/thomas_j_perkins/Downloads/PAC-Global-Vista.asp"
    out_file = "c:/users/thomas_j_perkins/Downloads/results.txt"
    write_to_file(extract_proxies(open(in_file, "r+")), out_file)

仍然得到同样的错误:

Running....
Traceback (most recent call last):
  File "pull_proxy.py", line 28, in <module>
    write_to_file(extract_proxies(open(in_file)), out_file)
  File "pull_proxy.py", line 8, in extract_proxies
    if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
  File "C:\Python27\lib\re.py", line 194, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 233, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

3 个答案:

答案 0 :(得分:2)

re.compile期待一个适当的flags参数(整数),其line(字符串)不是。

您应该re.match而不是re.compile

  

re.compile

     

将正则表达式模式编译为正则表达式对象,   可以使用match()search()进行匹配   方法...

答案 1 :(得分:1)

您的初始错误

TypeError: unsupported operand type(s) for &: 'str' and 'int'

正是由@Moses在答案中所说的。 flags应该是int值,而不是字符串。

你应该编译你的正则表达式一次。此外,当您遍历这些行时,您需要使用打开的文件句柄。

导入重新

IP_MATCHER = re.compile(r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})")

def extract_proxies(fh):
    for line in fh:
        line = line.strip()
        match = IP_MATCHER.findall(line)
        if match:
            print "{} appened to buffer.".format(line)
            print match
        else:
            pass



def write_to_file(buffer, out_file):
    for proxy in buffer:
        with open(out_file, "a+") as res:
            res.write(proxy)


if __name__ == '__main__':
    print "Running...."
    in_file = "in.txt"
    with open(in_file) as fh:
        extract_proxies(fh)

如果您只想要第一个匹配项,则会找到所有匹配项,然后使用IP_MATCHER.searchmatch.groups()。这当然是假设您确实想要提取IP地址。

例如:

def extract_proxies(fh):
    for line in fh:
        line = line.strip()
        match = IP_MATCHER.findall(line)
        if len(match) == 2:
            print "{} appened to buffer.".format(line)
            ip, mask = match
            print "IP: %s => Mask: %s" % (ip, mask)
        else:
            pass

答案 2 :(得分:1)

请检查以下代码:

做了几次改变

  1. re.compile - 应首先编译正则表达式,然后才能与'match / search / findall'一起使用。
  2. 正则表达不合适。在编写正则表达式时,我们需要从行首开始考虑。正则表达式与行间的单词不匹配。
  3.  import re
    
    
        def extract_proxies(in_file):
            buffer1 = []
            #Regex compiled here
            m = re.compile(r'\s*\w+\(\w+,\s+\"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\"')
    
            for line in in_file:
                #Used here to match
                r = m.match(line)
                if r is not None:
                    print "{} appened to buffer.".format(line)
                    buffer1.append(r.group(1))
                else:
                    pass
    
            in_file.close()
            return buffer1
    
    
        def write_to_file(buffer1, out_file):
            for proxy in buffer1:
                with open(out_file, "a+") as res:
                    res.write(proxy+'\n')
    
    
        if __name__ == '__main__':
            print "Running...."
            in_file = "sample.txt"
            out_file = "results.txt"
            write_to_file(extract_proxies(open(in_file)), out_file)
    

    输出:

    C:\Users\dinesh_pundkar\Desktop>python c.py
    Running....
            isInNet(hostDNS, "147.163.1.0", "255.255.0.0") ||
     appened to buffer.
            isInNet(hostDNS, "123.264.0.0", "255.255.0.0") ||
     appened to buffer.
            isInNet(hostDNS, "137.5.0.0", "255.0.0.0") ||
     appened to buffer.
            isInNet(hostDNS, "100.01.02.0", "255.0.0.0") ||
     appened to buffer.
            isInNet(hostDNS, "172.146.30.0", "255.240.0.0") ||
     appened to buffer.
            isInNet(hostDNS, "112.268.0.0", "255.255.0.0") || appened to buffer.
    
    C:\Users\dinesh_pundkar\Desktop>python c.py