Question

在给定的时间间隔内从文本文件中获取数据的方法是什么。

search.txt

19:00:00 ,  trakjkfsa,
19:00:00 ,  door,
19:00:00 ,  sweater,
19:00:00 ,  sweater,
19:00:00 ,  sweater,
19:00:00 ,  dis,
19:00:01 ,  not,
19:00:01 ,  nokia,
19:00:01 ,  collar,
19:00:01 ,  nokia,
19:00:01 ,  collar,
19:00:01 ,  gsm,
19:00:01 ,  sweater,
19:00:01 ,  sweater,
19:00:01 ,  gsm,
19:00:02 ,  gsm,
19:00:02 ,  show,
19:00:02 ,  wayfreyerv,
19:00:02 ,  door,
19:00:02 ,  collar,
19:00:02 ,  or,
19:00:02 ,  harman,
19:00:02 ,  women's,
19:00:02 ,  collar,
19:00:02 ,  sweater,
19:00:02 ,  head,
19:00:03 ,  womanw,
19:00:03 ,  com.shopclues.utils.k@42233ff0,
19:00:03 ,  samsu,
19:00:03 ,  adidas,
19:00:03 ,  collar,
19:00:04 ,  ambas,

我需要在时间19:00:00 - 19:00:03之间找出所有查询有没有办法找出来？

Answer 1

使用内置datetime module：

import datetime as dt

t_start = dt.time(19,0,0)
t_end = dt.time(19,0,3)
with open('search.txt') as f:
    for line in f:
        fields = [ x.strip() for x in line.split(',') ]
        timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S").time()

        if t_start < timestamp < t_end:  # use "<=" if you want to search with boundaries included
            print fields[1],

这将打印：

not nokia collar nokia collar gsm sweater sweater gsm gsm show wayfreyerv door collar or harman women's collar sweater head

Answer 2

file = open('search.txt')

start = '19:00:02'
end = '19:00:04'
queries = []     
line = file.read(10)         #read first 10 bytes

while start not in line:   #while the first 10 characters are not '19:00:02'
    file.readline()        
    line = file.read(10)

while end not in line:
    queries.append(file.readline().strip())
    line = file.read(10)

print queries

这将读取每行的前10个字节，其中包含逗号之前的每个字符。如果字符串19:00:04不在读取字符串中，我会将file.readline().strip()的其余部分附加到queries列表中。这将在读取search_for时间之前完成。

在一段时间间隔内获取数据

2 个答案: