我试图找到如何在浏览特定文件后停止os.walk。
我有一个按日期组织的日志文件目录。我试图替换grep搜索,允许用户查找存储在他们指定的日期范围内的IP地址。
该程序将采用以下参数:
-i ipv4或带子网的ipv6地址
-s开始日期即2013/12/20匹配文件结构
-e结束日期
我假设因为自上而下的选项是一个逻辑,应该允许我声明一个端点,这样做的最佳方法是什么?我在循环中思考。
如果我的问题出了问题,我会提前道歉。刚刚检查了血糖,它的低56,gd型。
其他信息
文件结构将位于flow / index_border中
2013
- 01
- 02
---- 01
----...
29 ----
2014
___________希望这是明确的,年份文件夹包含月份文件夹,包含日期文件夹,包含每小时文件。日期向下增加.___________________
结束日期需要具有包容性(我没有过多关注它,因为我可以添加代码以便提升一天)
我一直试图制作一个日期范围功能,我很惊讶我在任何日期时间的文档中都没有看到这个,看起来它会很有用。
import os, gzip, netaddr, datetime, argparse
startDir = '.'
def sdate_format(s):
try:
return (datetime.datetime.strptime(s, '%Y/%m/%d').date())
except ValueError:
msg = "Bad start date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
def edate_format(e):
try:
return (datetime.datetime.strptime(e, '%Y/%m/%d').date())
except ValueError:
msg = "Bad end date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
parser = argparse.ArgumentParser(description='Locate IP address in log files for a particular date or date range')
parser.add_argument('-s', '--start_date', action='store', type=sdate_format, dest='start_date', help='The first date in range of interest.')
parser.add_argument('-e', '--end_date', action='store', type=edate_format, dest='end_date', help='The last date in range of interest.')
parser.add_argument('-i', action='store', dest='net', help='IP address or address range, IPv4 or IPv6 with optional subnet accepted.', required=True)
results = parser.parse_args()
start = results.start_date
end = results.end_date
target_ip = results.net
startDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(start.year, start.month, start.day)
print('searching...')
for root, dirs, files in os.walk(startDir):
for contents in files:
if contents.endswith('.gz'):
f = gzip.open(os.path.join(root, contents), 'r')
else:
f = open(os.path.join(root, contents), 'r')
text = f.readlines()
f.close()
for line in text:
for address_item in netaddr.IPNetwork(target_IP):
if str(address_item) in line:
print line,
答案 0 :(得分:0)
您需要描述哪些有效或无效。您的代码的argparse
看起来不错,但我还没有完成任何测试。 type
的使用令人耳目一新。 :)(海报经常滥用该参数。)
但至于stopping
,我猜你可以做到:
endDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(end.year, end.month, end.day)
for root, dirs, files in os.walk(startDir):
for contents in files:
....
if endDir in <something based on dirs and files>:
break
我不太清楚您的文件结构是否更具体。自从我与os.walk
合作以来,这段时间也是如此。无论如何,我认为条件break
是提前停止行走的方法。
答案 1 :(得分:0)
#!/usr/bin/env python
import os, gzip, netaddr, datetime, argparse, sys
searchDir = '.'
searchItems = []
def sdate_format(s):
try:
return (datetime.datetime.strptime(s, '%Y/%m/%d').date())
except ValueError:
msg = "Bad start date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
def edate_format(e):
try:
return (datetime.datetime.strptime(e, '%Y/%m/%d').date())
except ValueError:
msg = "Bad end date. Please use yyyy/mm/dd format."
raise argparse.ArgumentTypeError(msg)
parser = argparse.ArgumentParser(description='Locate IP address in log files for a particular date or date range')
parser.add_argument('-s', '--start_date', action='store', type=sdate_format, dest='start_date',
help='The first date in range of interest.', required=True)
parser.add_argument('-e', '--end_date', action='store', type=edate_format, dest='end_date',
help='The last date in range of interest.', required=True)
parser.add_argument('-i', action='store', dest='net',
help='IP address or address range, IPv4 or IPv6 with optional subnet accepted.', required=True)
results = parser.parse_args()
start = results.start_date
end = results.end_date + datetime.timedelta(days=1)
target_IP = results.net
dateRange = end - start
for addressOfInterest in(netaddr.IPNetwork(target_IP)):
searchItems.append(str(addressOfInterest))
print('searching...')
for eachDay in range(dateRange.days):
period = start+datetime.timedelta(days=eachDay)
searchDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(period.year, period.month, period.day)
for contents in os.listdir(searchDir):
if contents.endswith('.gz'):
f = gzip.open(os.path.join(searchDir, contents), 'rb')
text = f.readlines()
f.close()
else:
f = open(os.path.join(searchDir, contents), 'r')
text = f.readlines()
f.close()
#for line in text:
# break
for addressOfInterest in searchItems:
for line in text:
if addressOfInterest in line:
# if str(address_item) in line:
print contents
print line,
我正在敲打我的头,因为我以为我正在打印一份副本。原来我给测试的文件有重复。由于文件系统的可预测性,我最终删除了os.walk,但@hpaulj确实提供了正确的解决方案。非常感谢!