我有一个降雨间隔半小时的数据集。我想总结每天的降雨量,并跟踪每天总计多少个数据点以弥补数据缺口。然后,我想创建一个新文件,其中包含用于日期的列,用于降雨的列以及用于每天可累加多少数据点的列。 每天的总和是我要这样做的功能,获取数据是我提取数据的功能。
def get_data(avrains):
print('opening{}'.format(avrains))
with open(avrains, 'r') as rfile:
header = rfile.readline()
dates = []
rainfalls = []
for line in rfile:
line = (line.strip())
row = line.split(',')
d = datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
r = row[-1]
dates.append(d)
rainfalls.append(float(r))
data = zip(dates, rainfalls)
data = sorted(data)
return (data)
def dailysum(rains):
day_date = []
rain_sum = []
for i in rains:
dayi = i[0]
rainsi = i[1]
for i in dayi:
try:
if dayi[i]== dayi[i+1]:
s= rains[i]+rains[i+1]
rain_sum.append(float(s))
except:
pass
day_date.append(dayi[i])
答案 0 :(得分:0)
有很多方法可以解决此问题,但是我将尽我所能保持与现有代码的距离:
def get_data(avrains):
"""
opens the file specified in avrains and returns a dictionary
keyed by date, containing a 2-tuple of the total rainfall and
the count of data points, like so:
{
date(2018, 11, 1) : (0.25, 6),
date(2018, 11, 2) : (0.00, 5),
}
"""
print('opening{}'.format(avrains))
rainfall_totals = dict()
with open(avrains, 'r') as rfile:
header = rfile.readline()
for line in rfile:
line = (line.strip())
row = line.split(',')
d = datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
r = row[-1]
try:
daily_rainfall, daily_count = rainfalls[d]
daily_rainfall += r
daily_count += 1
rainfalls[d] = (daily_rainfall, daily_count)
except KeyError:
# if we don't find that date in rainfalls, add it
rainfalls[d] = (r, 1)
return rainfalls
现在,当您致电get_data("/path/to/file")
时,您将获得一本字典。您可以使用以下方法吐出这些值:
foo = get_data("/path/to/file")
for (measure_date, (rainfall, observations)) in foo.items():
print measure_date, rainfall, observations
(我将保留日期的格式以及练习的任何排序或文件写法:))