Name,USAF,NCDC,Date,HrMn,I,Type,Dir,Q,I,Spd,Q
OXNARD,723927,93110,19590101,0000,4,SAO,270,1,N,3.1,1,
OXNARD,723927,93110,19590101,0100,4,SAO,338,1,N,1.0,1,
OXNARD,723927,93110,19590101,0200,4,SAO,068,1,N,1.0,1,
OXNARD,723927,93110,19590101,0300,4,SAO,068,1,N,2.1,1,
OXNARD,723927,93110,19590101,0400,4,SAO,315,1,N,1.0,1,
OXNARD,723927,93110,19590101,0500,4,SAO,999,1,C,0.0,1,
....
OXNARD,723927,93110,19590102,0000,4,SAO,225,1,N,2.1,1,
OXNARD,723927,93110,19590102,0100,4,SAO,248,1,N,2.1,1,
OXNARD,723927,93110,19590102,0200,4,SAO,999,1,C,0.0,1,
OXNARD,723927,93110,19590102,0300,4,SAO,068,1,N,2.1,1,
以下是每行存储每小时风速(Spd)的csv文件片段。我想要做的是在csv文件中选择每天的所有每小时风,并将它们存储到一个临时的每日列表中,存储当天的所有小时值(如果没有缺失值则为24)。然后我将输出当天的列表,为第二天创建新的空列表,在第二天找到每小时的速度,输出每日列表,等等,直到文件结束。
我正在努力做一个很好的方法来做到这一点。我有一个想法是在第i行读取,确定日期(YYYY-MM-DD),然后在第i + 1行读取并查看该日期是否与日期i匹配。如果他们匹配,那么我们就在同一天。如果他们不这样做,那么我们就到了第二天。但我甚至无法弄清楚如何阅读文件中的下一行...
任何建议执行此方法或全新(和更好的?!)方法都是最受欢迎的。提前谢谢你!
obs_in = open(csv_file).readlines()
for i in range(1,len(obs_in)):
# Skip over the header lines
if not str(obs_in[i]).startswith("Identification") and not str(obs_in[i]).startswith("Name"):
name,usaf,ncdc,date,hrmn,i,type,dir,q,i2,spd,q2,blank = obs_in[i].split(',')
current_dt = datetime.date(int(date[0:4]),int(date[4:6]),int(date[6:8]))
current_spd = spd
# Read in next line's date: is it in the same day?
# If in the same day, then append spd into tmp daily list
# If not, then start a new list for the next day
答案 0 :(得分:1)
您可以利用数据文件的良好排序特性并使用csv.dictreader
。然后你可以很简单地建立一个按日期组织的风速字典,你可以随意处理。请注意,csv reader返回字符串,因此您可能希望在组合列表时适当地转换为其他类型。
import csv
from collections import defaultdict
bydate = defaultdict(list)
rdr = csv.DictReader(open('winds.csv','rt'))
for k in rdr:
bydate[k['Date']].append(float(k['Spd']))
print(bydate)
defaultdict(<type 'list'>, {'19590101': [3.1000000000000001, 1.0, 1.0, 2.1000000000000001, 1.0, 0.0], '19590102': [2.1000000000000001, 2.1000000000000001, 0.0, 2.1000000000000001]})
您显然可以将参数更改为append
对元组的调用,例如append((float(k['Spd']), datetime.datetime.strptime(k['Date']+k['HrMn'],'%Y%m%D%H%M))
,以便您也可以收集时间。
如果文件中有多余的空格,您可以使用skipinitialspace
参数:rdr = csv.DictReader(open('winds.csv','rt'), fieldnames=ff, skipinitialspace=True)
。如果仍然无效,您可以预处理标题行:
bydate = defaultdict(list)
with open('winds.csv', 'rt') as f:
fieldnames = [k.strip() for k in f.readline().split(', ')]
rdr = csv.DictReader(f, fieldnames=fieldnames, skipinitialspace=True)
for k in rdr:
bydate[k['Date']].append(k['Spd'])
return bydate
像普通字典一样访问 bydate
。要访问特定日期的数据,请执行bydate['19590101']
。要获取已处理的日期列表,您可以执行bydate.keys()
。
如果要在读取文件时将它们转换为Python日期时间对象,可以导入datetime
,然后用bydate[datetime.datetime.strptime(k['Date'], '%Y%m%d')].append(k['Spd'])
替换分配行。
答案 1 :(得分:0)
可能是这样的。
def dump(buf, date):
"""dumps buffered line into file 'spdYYYYMMDD.csv'"""
if len(buf) == 0: return
with open('spd%s.csv' % date, 'w') as f:
for line in buf:
f.write(line)
obs_in = open(csv_file).readlines()
# buf stores one day record
buf = []
# date0 is meant for time stamp for the buffer
date0 = None
for i in range(1,len(obs_in)):
# Skip over the header lines
if not str(obs_in[i]).startswith("Identification") and \
not str(obs_in[i]).startswith("Name"):
name,usaf,ncdc,date,hrmn,ii,type,dir,q,i2,spd,q2,blank = \
obs_in[i].split(',')
current_dt = datetime.date(int(date[0:4]),int(date[4:6]),int(date[6:8]))
current_spd = spd
# see if the time stamp of current record is different. if it is different
# dump the buffer, and also set the time stamp of buffer
if date != date0:
dump(buf, date0)
buf = []
date0 = date
# you change this. i am simply writing entire line
buf.append(obs_in[i])
# when you get out the buffer should be filled with the last day's record.
# so flush that too.
dump(buf, date0)
我还发现,我必须使用ii
代替i
来提交数据的“I”,因为您使用i
作为循环计数器。
答案 2 :(得分:0)
我知道这个问题来自几年前,但只是想指出一个小的bash脚本可以整齐地执行这个任务。我将您的示例复制到名为data.txt的文件中,这是脚本:
#!/bin/bash
date=19590101
date_end=19590102
while [[ $date -le $date_end ]] ; do
grep ",${date}," data.txt > file_${date}.txt
date=`date +%Y%m%d -d ${date}+1day` # NOTE: MAC-OSX date differs
done
请注意,由于某些原因日期命令实现不同,因此无法在MAC上工作。如果文件中缺少日期,则grep命令会生成一个空文件 - 此链接显示了避免这种情况的方法: how to stop grep creating empty file if no results