我有5分钟的时间序列数据,其中给出了日期,但暗含了 time 。
您将如何为该数据集创建日期时间索引?
开始数据(熊猫数据框):
date values
0 2015-03-07 559.179
0 2015-03-07 521.094
0 2015-03-07 466.406
0 2015-03-07 425.586
0 2015-03-07 385.547
.. ... ...
81 2014-12-16 None
81 2014-12-16 None
81 2014-12-16 160.938
81 2014-12-16 145.118
81 2014-12-16 125.977
目标:
values
2014-12-16 T12:00 AM None
2014-12-16 T12:05 AM None
2014-12-16 T12:10 AM 160.938
2014-12-16 T12:15 AM 145.118
2014-12-16 T12:20 AM 125.977
... ...
2015-03-07 T12:00 AM 559.179
2015-03-07 T12:05 AM 521.094
2015-03-07 T12:10 AM 466.406
2015-03-07 T12:15 AM 425.586
2015-03-07 T12:20 AM 385.547
对于上下文,这是我当前的python代码:
import requests
import json
import pandas as pd
# get the first page of data and the total number of pages.
# note: each page has up to 25 days of data and each day is stored as a JSON object
web = 'https://company-name.com/sensors/12944473/daily_mesures'
r = requests.get(web, auth=(user, pwd))
json_list = r.json()
print(json.dumps(json_list[0], indent = 4))
last_uri = r.links['last']['url']
# loop through the pages, and loop through the days on each page,
# append to the json list
while r.url != last_uri:
r = requests.get(r.links['next']['url'], auth=(user, pwd))
for day in r.json():
json_list.append(day)
# get the fields of interest, "date" & "values"
data = [{'date': x['date'], 'values': eval(x['values'])} for x in json_list]
# test_df = [{'date': x['date'], 'values': eval(x['values']), 'timestamp': pd.date_range(start = x['date'], periods = len(x['values']), freq = "5 min")} for x in json_list] # this kind of works but makes a nested list of time-date index for each day which then still need to be flattened.
df = pd.DataFrame(data).explode('values')
print(df)
输出:
{
"id": 38421212571,
"sensor_id": 12944473,
"value": "6365.4957",
"date": "2015-03-07",
"min": "0.0",
"max": "1091.21",
"avg": "265.229",
"values": "[559.1795,521.094,466.406,425.586,385.547,344.14099999999996,302.344,265.2345,226.172,203.516,164.063,126.953,92.578,64.844,33.594,29.1015,12.891,7.813,5.469,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.5389999999999997,18.359,30.078,35.938,50.781,157.422,158.594,93.9455,266.016,220.7035,351.36699999999996,451.953,525.781,562.8910000000001,319.531,235.156,193.75,189.844,356.6405,340.234,564.453,486.5235,407.813,272.07050000000004,378.125,398.047,398.047,853.516,621.875,654.1015,582.422,456.64099999999996,490.82050000000004,823.438,862.1095,489.063,366.797,480.078,486.719,419.336,505.664,438.672,511.719,677.344,953.9065,964.0625,619.922,967.1875,551.563,979.297,1013.67,735.547,1015.43,1004.3,1080.47,1091.21,979.8815,967.188,974.6095,964.844,1000,976.953,682.813,993.9449999999999,985.547,735.547,832.422,976.563,855.078,971.289,614.6485,834.766,963.281,402.5395,743.359,972.656,962.8905,820.508,1004.69,980.0785000000001,645.703,305.469,420.313,618.164,442.5785,832.617,966.211,815.8199999999999,473.047,371.094,430.8595,991.406,961.719,979.6875,995.313,747.656,980.469,985.156,993.164,1001.56,993.359,980.664,986.133,1000.9749999999999,1010.55,980.2735,999.0219999999999,854.2965,352.7345,230.859,269.922,449.21900000000005,599.219,265.2345,207.031,132.422,380.4685,205.8595,194.141,155.469,141.406,115.625]",
"start_time": null,
"start_value": null
}
date values
0 2015-03-07 559.179
0 2015-03-07 521.094
0 2015-03-07 466.406
0 2015-03-07 425.586
0 2015-03-07 385.547
.. ... ...
81 2014-12-16 None
81 2014-12-16 None
81 2014-12-16 160.938
81 2014-12-16 145.118
81 2014-12-16 125.977
[23616 rows x 2 columns]
注意:天不是按时间顺序排列的。 时间按顺序排列,但天按相反顺序排列。
P.s。,这是我之前的问题的后续内容:
How do I get this JSON time series data into a pandas dataframe?
谢谢!
Python:3.7.4
熊猫:0.25.3
琼斯:2.0.9
要求:2.22.0
答案 0 :(得分:0)
看看这是否适合您。 运行该功能后,您可以进行变通以获得所需的格式(我已经以DT2为例进行了演示)
df['date']=pd.to_datetime(df['date'])
from datetime import timedelta
def f(x):
x['DT']=[val +timedelta(minutes=pos*5)for val,pos in zip(x.date,range(1,len (x.date)+1))]
return x
df = df.groupby('date').apply(f)
df['DT2']= df['DT'].dt.strftime('%m/%d/%Y %I:%M:%S %p')
df
输出
date values DT DT2
0 2015-03-07 559.179 2015-03-07 00:05:00 03/07/2015 12:05:00 AM
1 2015-03-07 521.094 2015-03-07 00:10:00 03/07/2015 12:10:00 AM
2 2015-03-07 466.406 2015-03-07 00:15:00 03/07/2015 12:15:00 AM
3 2015-03-07 425.586 2015-03-07 00:20:00 03/07/2015 12:20:00 AM
4 2015-03-07 385.547 2015-03-07 00:25:00 03/07/2015 12:25:00 AM
5 2014-12-16 None 2014-12-16 00:05:00 12/16/2014 12:05:00 AM
6 2014-12-16 None 2014-12-16 00:10:00 12/16/2014 12:10:00 AM
7 2014-12-16 160.938 2014-12-16 00:15:00 12/16/2014 12:15:00 AM
8 2014-12-16 145.118 2014-12-16 00:20:00 12/16/2014 12:20:00 AM
9 2014-12-16 125.977 2014-12-16 00:25:00 12/16/2014 12:25:00 AM
10 2014-12-16 125.977 2014-12-16 00:30:00 12/16/2014 12:30:00 AM