Question

我有5分钟的时间序列数据，其中给出了日期，但暗含了 time 。
您将如何为该数据集创建日期时间索引？

开始数据（熊猫数据框）：

          date   values
0   2015-03-07  559.179
0   2015-03-07  521.094
0   2015-03-07  466.406
0   2015-03-07  425.586
0   2015-03-07  385.547
..         ...      ...
81  2014-12-16     None
81  2014-12-16     None
81  2014-12-16  160.938
81  2014-12-16  145.118
81  2014-12-16  125.977

目标：

                        values
2014-12-16 T12:00 AM    None
2014-12-16 T12:05 AM    None
2014-12-16 T12:10 AM    160.938
2014-12-16 T12:15 AM    145.118
2014-12-16 T12:20 AM    125.977
...                     ...
2015-03-07 T12:00 AM    559.179
2015-03-07 T12:05 AM    521.094
2015-03-07 T12:10 AM    466.406
2015-03-07 T12:15 AM    425.586
2015-03-07 T12:20 AM    385.547

对于上下文，这是我当前的python代码：

import requests
import json
import pandas as pd

# get the first page of data and the total number of pages.
# note: each page has up to 25 days of data and each day is stored as a JSON object

web = 'https://company-name.com/sensors/12944473/daily_mesures' 
r = requests.get(web, auth=(user, pwd))
json_list = r.json()
print(json.dumps(json_list[0], indent = 4))

last_uri = r.links['last']['url']


# loop through the pages, and loop through the days on each page, 
# append to the json list

while r.url != last_uri:
    r = requests.get(r.links['next']['url'], auth=(user, pwd))
    for day in r.json():
        json_list.append(day)

# get the fields of interest, "date" & "values"

data = [{'date': x['date'], 'values': eval(x['values'])} for x in json_list]
# test_df = [{'date': x['date'], 'values': eval(x['values']), 'timestamp': pd.date_range(start = x['date'], periods = len(x['values']), freq = "5 min")} for x in json_list] # this kind of works but makes a nested list of time-date index for each day which then still need to be flattened.
df = pd.DataFrame(data).explode('values')
print(df)

输出：

{
    "id": 38421212571,
    "sensor_id": 12944473,
    "value": "6365.4957",
    "date": "2015-03-07",
    "min": "0.0",
    "max": "1091.21",
    "avg": "265.229",
    "values": "[559.1795,521.094,466.406,425.586,385.547,344.14099999999996,302.344,265.2345,226.172,203.516,164.063,126.953,92.578,64.844,33.594,29.1015,12.891,7.813,5.469,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.5389999999999997,18.359,30.078,35.938,50.781,157.422,158.594,93.9455,266.016,220.7035,351.36699999999996,451.953,525.781,562.8910000000001,319.531,235.156,193.75,189.844,356.6405,340.234,564.453,486.5235,407.813,272.07050000000004,378.125,398.047,398.047,853.516,621.875,654.1015,582.422,456.64099999999996,490.82050000000004,823.438,862.1095,489.063,366.797,480.078,486.719,419.336,505.664,438.672,511.719,677.344,953.9065,964.0625,619.922,967.1875,551.563,979.297,1013.67,735.547,1015.43,1004.3,1080.47,1091.21,979.8815,967.188,974.6095,964.844,1000,976.953,682.813,993.9449999999999,985.547,735.547,832.422,976.563,855.078,971.289,614.6485,834.766,963.281,402.5395,743.359,972.656,962.8905,820.508,1004.69,980.0785000000001,645.703,305.469,420.313,618.164,442.5785,832.617,966.211,815.8199999999999,473.047,371.094,430.8595,991.406,961.719,979.6875,995.313,747.656,980.469,985.156,993.164,1001.56,993.359,980.664,986.133,1000.9749999999999,1010.55,980.2735,999.0219999999999,854.2965,352.7345,230.859,269.922,449.21900000000005,599.219,265.2345,207.031,132.422,380.4685,205.8595,194.141,155.469,141.406,115.625]",
    "start_time": null,
    "start_value": null
}
          date   values
0   2015-03-07  559.179
0   2015-03-07  521.094
0   2015-03-07  466.406
0   2015-03-07  425.586
0   2015-03-07  385.547
..         ...      ...
81  2014-12-16     None
81  2014-12-16     None
81  2014-12-16  160.938
81  2014-12-16  145.118
81  2014-12-16  125.977

[23616 rows x 2 columns]

注意：天不是按时间顺序排列的。时间按顺序排列，但天按相反顺序排列。

P.s。，这是我之前的问题的后续内容：
How do I get this JSON time series data into a pandas dataframe?

谢谢！

Python：3.7.4
熊猫：0.25.3
琼斯：2.0.9
要求：2.22.0

Answer 1

看看这是否适合您。运行该功能后，您可以进行变通以获得所需的格式（我已经以DT2为例进行了演示）

df['date']=pd.to_datetime(df['date'])
from datetime import timedelta
def f(x):
    x['DT']=[val +timedelta(minutes=pos*5)for val,pos in zip(x.date,range(1,len (x.date)+1))]
    return x
df = df.groupby('date').apply(f)
df['DT2']= df['DT'].dt.strftime('%m/%d/%Y %I:%M:%S %p')
df

输出

          date  values                      DT                        DT2
0   2015-03-07  559.179     2015-03-07 00:05:00     03/07/2015 12:05:00 AM
1   2015-03-07  521.094     2015-03-07 00:10:00     03/07/2015 12:10:00 AM
2   2015-03-07  466.406     2015-03-07 00:15:00     03/07/2015 12:15:00 AM
3   2015-03-07  425.586     2015-03-07 00:20:00     03/07/2015 12:20:00 AM
4   2015-03-07  385.547     2015-03-07 00:25:00     03/07/2015 12:25:00 AM
5   2014-12-16  None        2014-12-16 00:05:00     12/16/2014 12:05:00 AM
6   2014-12-16  None        2014-12-16 00:10:00     12/16/2014 12:10:00 AM
7   2014-12-16  160.938     2014-12-16 00:15:00     12/16/2014 12:15:00 AM
8   2014-12-16  145.118     2014-12-16 00:20:00     12/16/2014 12:20:00 AM
9   2014-12-16  125.977     2014-12-16 00:25:00     12/16/2014 12:25:00 AM
10  2014-12-16  125.977     2014-12-16 00:30:00     12/16/2014 12:30:00 AM

我有约会，但隐含时间；如何为该熊猫数据框创建日期时间索引？

1 个答案: