os.walk设置开始和结束点 - python

时间:2016-08-04 21:55:32

标签: python-2.7 datetime argparse os.walk

我试图找到如何在浏览特定文件后停止os.walk。

我有一个按日期组织的日志文件目录。我试图替换grep搜索,允许用户查找存储在他们指定的日期范围内的IP地址。

该程序将采用以下参数:

-i ipv4或带子网的ipv6地址

-s开始日期即2013/12/20匹配文件结构

-e结束日期

我假设因为自上而下的选项是一个逻辑,应该允许我声明一个端点,这样做的最佳方法是什么?我在循环中思考。

如果我的问题出了问题,我会提前道歉。刚刚检查了血糖,它的低56,gd型。

其他信息

文件结构将位于flow / index_border中

2013

- 01

- 02

---- 01

----...

29 ----

2014

___________希望这是明确的,年份文件夹包含月份文件夹,包含日期文件夹,包含每小时文件。日期向下增加.___________________

结束日期需要具有包容性(我没有过多关注它,因为我可以添加代码以便提升一天)

我一直试图制作一个日期范围功能,我很惊讶我在任何日期时间的文档中都没有看到这个,看起来它会很有用。

import os, gzip, netaddr, datetime, argparse
startDir = '.'
def sdate_format(s):
    try:
        return (datetime.datetime.strptime(s, '%Y/%m/%d').date())
    except ValueError:
        msg = "Bad start date. Please use yyyy/mm/dd format."
        raise argparse.ArgumentTypeError(msg)
def edate_format(e):
    try:
        return (datetime.datetime.strptime(e, '%Y/%m/%d').date())
    except ValueError:
        msg = "Bad end date. Please use yyyy/mm/dd format."
        raise argparse.ArgumentTypeError(msg)

parser = argparse.ArgumentParser(description='Locate IP address in log files for a particular date or date range')
parser.add_argument('-s', '--start_date', action='store', type=sdate_format, dest='start_date', help='The first date in range of interest.')
parser.add_argument('-e', '--end_date', action='store', type=edate_format, dest='end_date', help='The last date in range of interest.')
parser.add_argument('-i', action='store', dest='net', help='IP address or address range, IPv4 or IPv6 with optional subnet accepted.', required=True)

results = parser.parse_args()
start = results.start_date
end = results.end_date
target_ip = results.net
startDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(start.year, start.month, start.day)

print('searching...')
for root, dirs, files in os.walk(startDir):
    for contents in files:
        if contents.endswith('.gz'):
            f = gzip.open(os.path.join(root, contents), 'r')
        else:
            f = open(os.path.join(root, contents), 'r')
            text = f.readlines()
            f.close()
            for line in text:
                for address_item in netaddr.IPNetwork(target_IP):
                    if str(address_item) in line:
                        print line,

2 个答案:

答案 0 :(得分:0)

您需要描述哪些有效或无效。您的代码的argparse看起来不错,但我还没有完成任何测试。 type的使用令人耳目一新。 :)(海报经常滥用该参数。)

但至于stopping,我猜你可以做到:

endDir = '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(end.year, end.month, end.day)

for root, dirs, files in os.walk(startDir):
    for contents in files:
        ....
    if endDir in <something based on dirs and files>:
         break

我不太清楚您的文件结构是否更具体。自从我与os.walk合作以来,这段时间也是如此。无论如何,我认为条件break是提前停止行走的方法。

答案 1 :(得分:0)

#!/usr/bin/env python
import os, gzip, netaddr, datetime, argparse, sys
searchDir = '.'
searchItems = []
def sdate_format(s):
    try:
        return (datetime.datetime.strptime(s, '%Y/%m/%d').date())
    except ValueError:
        msg = "Bad start date. Please use yyyy/mm/dd format."
        raise argparse.ArgumentTypeError(msg)
def edate_format(e):
    try:
        return (datetime.datetime.strptime(e, '%Y/%m/%d').date())
    except ValueError:
        msg = "Bad end date. Please use yyyy/mm/dd format."
        raise argparse.ArgumentTypeError(msg)


parser = argparse.ArgumentParser(description='Locate IP address in log files for a particular date or date range')
parser.add_argument('-s', '--start_date', action='store', type=sdate_format, dest='start_date',
                        help='The first date in range of interest.', required=True)
parser.add_argument('-e', '--end_date', action='store', type=edate_format, dest='end_date',
                        help='The last date in range of interest.', required=True)
parser.add_argument('-i', action='store', dest='net',
                        help='IP address or address range, IPv4 or IPv6 with optional subnet accepted.', required=True)

results = parser.parse_args()
start = results.start_date
end = results.end_date + datetime.timedelta(days=1)
target_IP = results.net
dateRange = end - start
for addressOfInterest in(netaddr.IPNetwork(target_IP)):
    searchItems.append(str(addressOfInterest))
print('searching...')

for eachDay in range(dateRange.days):
    period = start+datetime.timedelta(days=eachDay)
    searchDir =  '/flows/index_border/{0}/{1:02d}/{2:02d}'.format(period.year, period.month, period.day)

for contents in os.listdir(searchDir):
    if contents.endswith('.gz'):
        f = gzip.open(os.path.join(searchDir, contents), 'rb')
        text = f.readlines()
        f.close()

    else:
        f = open(os.path.join(searchDir, contents), 'r')
        text = f.readlines()
        f.close()
#for line in text:
 #   break

for addressOfInterest in searchItems:
    for line in text:
        if addressOfInterest in line:
       # if str(address_item) in line:
            print contents
            print line,

我正在敲打我的头,因为我以为我正在打印一份副本。原来我给测试的文件有重复。由于文件系统的可预测性,我最终删除了os.walk,但@hpaulj确实提供了正确的解决方案。非常感谢!