我有一个csv文件,看起来像这样(显然实际上它更大):
1,$1,AA,GG,DD,2020-01-01T00:01:10.740+02:00
2,$2,A1,FD,HH,2020-01-01T00:02:00.240+02:00
3,$3,1A,PP,LL,2020-01-01T00:03:30.460+02:00
4,$4,S1,LL,SS,2020-02-01T00:01:11.190+02:00
5,$5,2G,PP,FF,2020-01-01T00:04:20.320+02:00
6,$6,5S,LL,TT,2020-02-01T01:02:15.180+02:00
我需要记录第一行,记录该日期,并检查其余各行是否等于当天并在0:00:00.000
小时至23:59:59.999
小时之间。这么说就简单了。我将第一行的日期内的所有行都取了。
这是我想要的结果:
1,$1,AA,GG,DD,2020-01-01T00:01:10.740+02:00
2,$2,A1,FD,HH,2020-01-01T00:02:00.240+02:00
3,$3,1A,PP,LL,2020-01-01T00:03:30.460+02:00
5,$5,2G,PP,FF,2020-01-01T00:04:20.320+02:00
这是我的代码:
root = r'c:\data\FF\Desktop\my_files\file01.txt'
with open(root, 'r') as my_file:
reader = csv.reader(my_file)
def filter_row():
for row in reader:
date_time = row[5] #<--- extract the datetime
fdate_time = datetime.strptime(date_time, '%Y-%m-%dT%H:%M:%S.%f%z') #<--- make a datetime object of it
x = fdate_time.date() #<--- extract the y/m/d
begin_time = datetime.strptime(x + '00:00.00+02:00','%Y-%m-%dT%H:%M:%S.%f%z') #<--- fix the start time of a day
end_time = datetime.strptime(x + '23:59:59.999+02:00', '%Y-%m-%dT%H:%M:%S.%f%z') #<--- fix the end time of a day
filtered_records = fdate_time >= begin_time and fdate_time <= end_time #<filter everything between the start and end time
return filtered_records
filter_row()
当我运行上面的代码时,我收到:
File "C:\data\FF\Desktop\Python\My_python\Filter_csv.py", line 82, in filter_row
for row in reader:
ValueError: I/O operation on closed file.
我真的丢失了它,因为我不知道如何解决。我寻找了多种解决方案,但找不到任何解决方案。希望有人能告诉我并告诉我它是如何工作的。谢谢大家。
答案 0 :(得分:1)
with
提供的上下文管理可确保在块末释放资源。这意味着应该在with
组内部 中读取所有内容。
一种简单的方法是对该函数进行参数化:
root = r'c:\data\FF\Desktop\my_files\file01.txt'
def filter_row(reader):
for row in reader:
...
return filtered_records
with open(root, 'r') as my_file:
reader = csv.reader(my_file)
filter_row(reader)
但是:
datetime.replace
方法来计算一天的开始和结束,而不要使用字符串filter row
更改为生成器:root = r'c:\data\FF\Desktop\my_files\file01.txt'
newf= r'c:\data\FF\Desktop\my_files\file01.csv'
def filter_row(reader):
first = True
for row in reader:
date_time = row[5] # <--- extract the datetime
fdate_time = datetime.strptime(date_time, '%Y-%m-%dT%H:%M:%S.%f%z') # <--- make a datetime object of it
if first: # special processing for the first line
first = False
begin_time = fdate_time.replace(hour=0, minute=0, second=0, microsecond=0) # <--- fix the start time of a day
end_time = fdate_time.replace(hour=23, minute=59, second=59, microsecond=999999) # <--- fix the end time of a day
yield row # yield first row
elif fdate_time >= begin_time and fdate_time <= end_time: # <filter everything between the start and end time
yield row # and rows of same date
with open(root) as my_file, open(newf, 'w', newline=None) as new_file:
reader = csv.reader(my_file)
writer = csv.writer(new_file)
writer.writerows(filter_row(reader))
答案 1 :(得分:0)
我建议您使用熊猫来做。
熊猫也将为您提供轻松的可伸缩性,以防将来文件大小增加。