我的数据以
的形式保存在json中[{"consumed_time": "2019-05-22 00:00:00", "count": 273208},
{"consumed_time": "2019-05-22 00:00:00", "count": 132408}, {"consumed_time": "2019-06-01 19:00:00", "count": 205916},....]
该数组可能包含重复的solded_time。我需要取任何一个重复的值,并消除其余的值。 我尝试使用熊猫对数据进行排序并重新采样。 我收到一个TypeError异常:仅对DatetimeIndex,TimedeltaIndex或PeriodIndex有效,但有一个'RangeIndex'实例
使用时得到以下信息-
series = pd.DataFrame(
{'data': [int(float(i['count'])) for i in json_values]},index= [pd.Timestamp(j['consumed_time']) for j in json_values])
data
2018-08-20 00:00:00 90557
2018-08-20 00:00:00 90560
2018-08-20 02:00:00 72896
2018-08-20 02:00:00 72889
2018-08-20 03:00:00 90309
2018-08-20 03:00:00 90317
2018-08-20 04:00:00 71248
2018-08-20 04:00:00 71248
2018-08-20 05:00:00 68549
2018-08-20 05:00:00 68548
2018-08-20 06:00:00 84896
2018-08-20 06:00:00 84899
2018-08-20 07:00:00 59688
代码-
with open(path2+file) as fp:
json_dump = json.load(fp)
json_dump.sort(key=lambda x: x['consumed_time'])
json_values = [d for d in json_dump]
series = pd.DataFrame(
{'data': [int(float(i['count'])) for i in json_values]},index= [pd.Timestamp(j['consumed_time']) for j in json_values])
print(series)
series = series.reset_index().drop_duplicates(subset='index',
keep='first').sort_index().reset_index(drop = True)
print(series)
print(type(series))
print(series.index)
series = series.resample('H').sum()
target = ["NaN" if a == 0 else a for a in series["data"]]
print(target)
final_res.append({"start": str(series["index"][0]), "target": target})
我需要输出-
data
2018-08-20 00:00:00 90557
2018-08-20 01:00:00 NaN
2018-08-20 02:00:00 72896
2018-08-20 03:00:00 90309
2018-08-20 04:00:00 71248
2018-08-20 05:00:00 68549
2018-08-20 06:00:00 84896
2018-08-20 07:00:00 59688