此脚本当前从文件中获取特定类型的IP地址,并将其格式化为csv。
如何更改此设置以使其查看其目录中的所有文件(与脚本相同的目录)并创建新的输出文件。这是我在python上的第一周,所以请尽可能简单。
#!usr/bin/python
# Extract IP address from file
#import modules
import re
# Open Source File
infile = open('stix1.xml', 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'w')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
答案 0 :(得分:2)
我认为你想看看glob.glob:https://docs.python.org/2/library/glob.html
这将返回与给定模式匹配的文件列表。
那么你可以做点什么 import re,globdef do_something_with(f):
# Open Source File
infile = open(f, 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'wa') ## ADDED a to append
# Create a list
BadIPs = []
### rest of you code
.
.
outfile.write(data)
infile.close
outfile.close
for f in glob.glob("*.xml"):
do_something_with(f)
答案 1 :(得分:1)
您可以获得所有XML文件的列表。
filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]
然后迭代所有文件。
for fn in filenames:
with open(fn) as infile:
for ln in infile:
# do your thing
with
- 语句确保文件在您完成后关闭。
答案 2 :(得分:1)
假设您要将所有输出添加到同一文件,这将是脚本:
#!usr/bin/python
import glob
import re
for infileName in glob.glob("*.xml"):
# Open Source File
infile = open(infileName, 'r')
# Append to file
outfile = open('ExtractedIPs.csv', 'a')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
答案 3 :(得分:0)
import sys
def extract(filename)
。python myscript.py file1 file2 file3
for filename in sys.argv[1:]:
。extract(filename)
。答案 4 :(得分:0)
我需要这样做,也要进入子目录。你需要导入os和os.path,然后可以使用这样的函数:
def recursive_glob(rootdir='.', suffix=()):
""" recursively traverses full path from route, returns
paths and file names for files with suffix in tuple """
pathlist = []
filelist = []
for looproot,dirnames, filenames in os.walk(rootdir):
for filename in filenames:
if filename.endswith(suffix):
pathlist.append(os.path.join(looproot, filename))
filelist.append(filename)
return pathlist, filelist
您传递要从其开始的顶级目录的函数以及您要查找的文件类型的后缀。这是为Windows编写和测试的,但我相信它也适用于其他操作系统,只要你有文件扩展可以使用。
答案 5 :(得分:0)
如果当前文件夹中的所有文件都相关,则可以使用os.listdir()
。如果没有,请说出所有.xml
个文件,然后使用glob.glob("*.xml")
。但整体计划可以改进,大致如下。
#import modules
import re
pat = re.compile(reg) # reg is your regex
with open("out.csv", "w") as fw:
writer = csv.writer(fw)
for f in os.listdir(): # or glob.glob("*.xml")
with open(f) as fr:
lines = (line for line in fr if line.isspace())
# genex for all ip in that file
ips = (ip for line in lines for ip in pat.findall(line))
writer.writerow(ips)
您可能需要更改它以满足确切需求。但是这个想法在这个版本中有很多副作用,更少的内存消耗和close
由上下文管理器管理。如果不起作用,请评论。