Question

此脚本当前从文件中获取特定类型的IP地址，并将其格式化为csv。

如何更改此设置以使其查看其目录中的所有文件（与脚本相同的目录）并创建新的输出文件。这是我在python上的第一周，所以请尽可能简单。

  #!usr/bin/python

    # Extract IP address from file 

    #import modules
    import re

    # Open Source File
    infile = open('stix1.xml', 'r')
    # Open output file
    outfile = open('ExtractedIPs.csv', 'w') 
    # Create a list
    BadIPs = []

    #search each line in doc
    for line in infile:
        # ignore empty lines
        if line.isspace(): continue

        # find IP that are Indicator Titles
        IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
        # Only take finds
        if not IP: continue
        # Add each found IP to the BadIP list
        BadIPs.append(IP)

    #tidy up for CSV format
    data = str(BadIPs)
    data = data.replace('[', '')
    data = data.replace(']', '')
    data = data.replace("'", "")
    # Write IPs to a file        
    outfile.write(data)

    infile.close
    outfile.close

Answer 1

我认为你想看看glob.glob：https://docs.python.org/2/library/glob.html

这将返回与给定模式匹配的文件列表。

那么你可以做点什么 import re，glob

def do_something_with(f):
   # Open Source File
   infile = open(f, 'r')
   # Open output file
   outfile = open('ExtractedIPs.csv', 'wa')  ## ADDED a to append
   # Create a list
   BadIPs = []

   ### rest of you code
   .
   .
   outfile.write(data)

   infile.close
   outfile.close

for f in glob.glob("*.xml"):
    do_something_with(f)

Answer 2

您可以获得所有XML文件的列表。

filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]

然后迭代所有文件。

for fn in filenames:
    with open(fn) as infile:
        for ln in infile:
            # do your thing

with - 语句确保文件在您完成后关闭。

Answer 3

假设您要将所有输出添加到同一文件，这将是脚本：

#!usr/bin/python
import glob   
import re

for infileName in glob.glob("*.xml"):
    # Open Source File
    infile = open(infileName, 'r')
    # Append to file
    outfile = open('ExtractedIPs.csv', 'a') 
    # Create a list
    BadIPs = []

    #search each line in doc
    for line in infile:
        # ignore empty lines
        if line.isspace(): continue

        # find IP that are Indicator Titles
        IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
        # Only take finds
        if not IP: continue
        # Add each found IP to the BadIP list
        BadIPs.append(IP)

    #tidy up for CSV format
    data = str(BadIPs)
    data = data.replace('[', '')
    data = data.replace(']', '')
    data = data.replace("'", "")
    # Write IPs to a file        
    outfile.write(data)

    infile.close
    outfile.close

Answer 4

import sys
使用当前代码创建一个函数，例如def extract(filename)。
使用所有文件名调用脚本：python myscript.py file1 file2 file3
在您的脚本中，循环遍历文件名for filename in sys.argv[1:]:。
在循环内调用函数：extract(filename)。

Answer 5

我需要这样做，也要进入子目录。你需要导入os和os.path，然后可以使用这样的函数：

def recursive_glob(rootdir='.', suffix=()):
    """ recursively traverses full path from route, returns
        paths and file names for files with suffix in tuple """
    pathlist = []
    filelist = []
    for looproot,dirnames, filenames in os.walk(rootdir):
        for filename in filenames:
            if filename.endswith(suffix):
                pathlist.append(os.path.join(looproot, filename))
                filelist.append(filename)
    return pathlist, filelist

您传递要从其开始的顶级目录的函数以及您要查找的文件类型的后缀。这是为Windows编写和测试的，但我相信它也适用于其他操作系统，只要你有文件扩展可以使用。

Answer 6

如果当前文件夹中的所有文件都相关，则可以使用os.listdir()。如果没有，请说出所有.xml个文件，然后使用glob.glob("*.xml")。但整体计划可以改进，大致如下。

#import modules
import re

pat = re.compile(reg) # reg is your regex
with open("out.csv", "w") as fw:
    writer = csv.writer(fw)
    for f in os.listdir(): # or glob.glob("*.xml")
        with open(f) as fr:
            lines = (line for line in fr if line.isspace())
            # genex for all ip in that file
            ips = (ip for line in lines for ip in pat.findall(line))
            writer.writerow(ips)

您可能需要更改它以满足确切需求。但是这个想法在这个版本中有很多副作用，更少的内存消耗和close由上下文管理器管理。如果不起作用，请评论。

如何在同一目录中的多个文件上运行此Python 2.7脚本

6 个答案: