用于ip地址的python解析文件

时间:2012-12-24 23:40:59

标签: python

我有一个包含多个IP地址的文件。在4行txt上有大约900个IP。我希望输出为每行1 IP。我怎么能做到这一点?基于其他代码,我已经提出了这个问题,但由于多个IP在单行上而失败:

import sys
import re

try:
    if sys.argv[1:]:
        print "File: %s" % (sys.argv[1])
        logfile = sys.argv[1]
    else:
        logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ")
    try:
        file = open(logfile, "r")
        ips = []
        for text in file.readlines():
           text = text.rstrip()
           regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
           if regex is not None and regex not in ips:
               ips.append(regex)

        for ip in ips:
           outfile = open("/tmp/list.txt", "a")
           addy = "".join(ip)
           if addy is not '':
              print "IP: %s" % (addy)
              outfile.write(addy)
              outfile.write("\n")
    finally:
        file.close()
        outfile.close()
except IOError, (errno, strerror):
        print "I/O Error(%s) : %s" % (errno, strerror)

4 个答案:

答案 0 :(得分:2)

表达式中的$锚点阻止您查找除最后一个条目之外的任何内容。删除它,然后使用.findall()返回的列表:

found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
if regex:
    ips.extend(found)

答案 1 :(得分:1)

findall函数返回一个匹配数组,你没有迭代每个匹配。

regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None:
    for match in regex:
        if match not in ips:
            ips.append(match)

答案 2 :(得分:1)

从文件中提取IP地址

我在this discussion回答了类似的问题。简而言之,它是基于我正在进行的一个项目的解决方案,该项目用于从不同类型的输入数据中提取网络和基于主机的指标(例如字符串,文件,博客帖子等):https://github.com/JohnnyWachter/intel


我会导入 IPAddresses Data 类,然后使用它们以下列方式完成您的任务:

#!/usr/bin/env/python

"""Extract IPv4 Addresses From Input File."""

from Data import CleanData  # Format and Clean the Input Data.
from IPAddresses import ExtractIPs  # Extract IPs From Input Data.


def get_ip_addresses(input_file_path):
    """"
    Read contents of input file and extract IPv4 Addresses.
    :param iput_file_path: fully qualified path to input file. Expecting str
    :returns: dictionary of IPv4 and IPv4-like Address lists
    :rtype: dict
    """

    input_data = []  # Empty list to house formatted input data.

    input_data.extend(CleanData(input_file_path).to_list())

    results = ExtractIPs(input_data).get_ipv4_results()

    return results
  • 现在你有了一个列表字典,你可以轻松访问你想要的数据并以你想要的任何方式输出它。以下示例使用上述功能;将结果打印到控制台,并将它们写入指定的输出文件:

    # Extract the desired data using the aforementioned function.
    ipv4_list = get_ip_addresses('/path/to/input/file')
    
    # Open your output file in 'append' mode.
    with open('/path/to/output/file', 'a') as outfile:
    
        # Ensure that the list of valid IPv4 Addresses is not empty.
        if ipv4_list['valid_ips']:
    
            for ip_address in ipv4_list['valid_ips']:
    
                # Print to console
                print(ip_address)
    
                # Write to output file.
                outfile.write(ip_address)
    

答案 3 :(得分:0)

没有re.MULTILINE标志$仅匹配字符串的末尾。

为了便于调试,可以将代码分成几个可以独立测试的部分。

def extract_ips(data):
    return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)

如果输入文件很小而您不需要保留ips的原始顺序:

with open(filename) as infile, open(outfilename, "w") as outfile:
    outfile.write("\n".join(set(extract_ips(infile.read()))))

否则:

with open(filename) as infile, open(outfilename, "w") as outfile:
    seen = set()
    for line in infile:
        for ip in extract_ips(line):
            if ip not in seen:
               seen.add(ip)
               print >>outfile, ip