Question

我有一个降雨间隔半小时的数据集。我想总结每天的降雨量，并跟踪每天总计多少个数据点以弥补数据缺口。然后，我想创建一个新文件，其中包含用于日期的列，用于降雨的列以及用于每天可累加多少数据点的列。每天的总和是我要这样做的功能，获取数据是我提取数据的功能。

def get_data(avrains):
    print('opening{}'.format(avrains))
    with open(avrains, 'r') as rfile:
        header = rfile.readline()
        dates = []
        rainfalls = []
        for line in rfile:
            line = (line.strip())
            row = line.split(',')
            d = datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
            r = row[-1]
            dates.append(d)
            rainfalls.append(float(r))
        data = zip(dates, rainfalls)
        data = sorted(data)
        return (data)

def dailysum(rains):
    day_date = []
    rain_sum = []
    for i in rains:
        dayi = i[0]
        rainsi = i[1]
    for i in dayi:
        try:
            if dayi[i]== dayi[i+1]:
                s= rains[i]+rains[i+1]
                rain_sum.append(float(s))
        except:
            pass
            day_date.append(dayi[i])

Answer 1

有很多方法可以解决此问题，但是我将尽我所能保持与现有代码的距离：

def get_data(avrains):
    """
    opens the file specified in avrains and returns a dictionary
    keyed by date, containing a 2-tuple of the total rainfall and
    the count of data points, like so:
    {
      date(2018, 11, 1) : (0.25, 6),
      date(2018, 11, 2) : (0.00, 5),
    }
    """
    print('opening{}'.format(avrains))
    rainfall_totals = dict()

    with open(avrains, 'r') as rfile:
        header = rfile.readline()
        for line in rfile:
            line = (line.strip())
            row = line.split(',')
            d = datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
            r = row[-1]

            try:
                daily_rainfall, daily_count = rainfalls[d]
                daily_rainfall += r
                daily_count += 1
                rainfalls[d] = (daily_rainfall, daily_count)
            except KeyError:
                # if we don't find that date in rainfalls, add it
                rainfalls[d] = (r, 1)

    return rainfalls

现在，当您致电get_data("/path/to/file")时，您将获得一本字典。您可以使用以下方法吐出这些值：

foo = get_data("/path/to/file")
for (measure_date, (rainfall, observations)) in foo.items():
    print measure_date, rainfall, observations

（我将保留日期的格式以及练习的任何排序或文件写法：））

在不使用熊猫的情况下总结日期时间

1 个答案: