根据小时重新采样

时间:2019-06-19 17:04:41

标签: pandas

我的数据以

的形式保存在json中
[{"consumed_time": "2019-05-22 00:00:00", "count": 273208},
{"consumed_time": "2019-05-22 00:00:00", "count": 132408}, {"consumed_time": "2019-06-01 19:00:00", "count": 205916},....]

该数组可能包含重复的solded_time。我需要取任何一个重复的值,并消除其余的值。 我尝试使用熊猫对数据进行排序并重新采样。 我收到一个TypeError异常:仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但有一个'RangeIndex'实例

使用时得到以下信息-

series = pd.DataFrame(
        {'data': [int(float(i['count'])) for i in json_values]},index= [pd.Timestamp(j['consumed_time']) for j in json_values])

 data
2018-08-20 00:00:00   90557
2018-08-20 00:00:00   90560
2018-08-20 02:00:00   72896
2018-08-20 02:00:00   72889
2018-08-20 03:00:00   90309
2018-08-20 03:00:00   90317
2018-08-20 04:00:00   71248
2018-08-20 04:00:00   71248
2018-08-20 05:00:00   68549
2018-08-20 05:00:00   68548
2018-08-20 06:00:00   84896
2018-08-20 06:00:00   84899
2018-08-20 07:00:00   59688

代码-

    with open(path2+file) as fp:
        json_dump = json.load(fp)


    json_dump.sort(key=lambda x: x['consumed_time'])

    json_values = [d for d in json_dump]
    series = pd.DataFrame(
        {'data': [int(float(i['count'])) for i in json_values]},index= [pd.Timestamp(j['consumed_time']) for j in json_values])

    print(series)
    series = series.reset_index().drop_duplicates(subset='index',
                                     keep='first').sort_index().reset_index(drop = True)
    print(series)
    print(type(series))
    print(series.index)
    series = series.resample('H').sum()

    target = ["NaN" if a == 0 else a for a in series["data"]]
    print(target)
    final_res.append({"start": str(series["index"][0]), "target": target})

我需要输出-

  data
2018-08-20 00:00:00   90557
2018-08-20 01:00:00   NaN
2018-08-20 02:00:00   72896
2018-08-20 03:00:00   90309
2018-08-20 04:00:00   71248
2018-08-20 05:00:00   68549
2018-08-20 06:00:00   84896
2018-08-20 07:00:00   59688

0 个答案:

没有答案