我有这样的数据框:
0 {\n
1 "Meta Data": {\n
2 "1. Information": "Intraday (5min) ope...
3 "2. Symbol": "SPY",\n
4 "3. Last Refreshed": "2020-02-12 16:00...
5 "4. Interval": "5min",\n
6 "5. Output Size": "Full size",\n
7 "6. Time Zone": "US/Eastern"\n
8 },\n
9 "Time Series (5min)": {\n
10 "2020-02-12 16:00:00": {\n
11 "1. open": "337.6300",\n
12 "2. high": "337.6500",\n
13 "3. low": "337.3800",\n
14 "4. close": "337.4100",\n
15 "5. volume": "2441804"\n
16 },\n
17 "2020-02-12 15:55:00": {\n
18 "1. open": "337.3700",\n
19 "2. high": "337.6500",\n
20 "3. low": "337.3600",\n
21 "4. close": "337.6250",\n
22 "5. volume": "1282631"\n
23 },\n
24 "2020-02-12 15:50:00": {\n
25 "1. open": "337.4050",\n
26 "2. high": "337.4800",\n
27 "3. low": "337.3400",\n
28 "4. close": "337.3600",\n
29 "5. volume": "1078047"\n
30 },\n
31 "2020-02-12 15:45:00": {\n
32 "1. open": "337.3150",\n
33 "2. high": "337.4300",\n
34 "3. low": "337.2900",\n
35 "4. close": "337.4020",\n
36 "5. volume": "434710"\n
37 },\n
......
}\n
}
我想将上述数据转换成这样的格式:
我知道如何删除1. 2. 3.,但不知道如何定位这些时间戳数据“ 2020-02-12 16:00:00” ,因为每个行,并且没有固定的名称,例如“ 1.open” ...“ 4.close”
非常感谢您的帮助!
答案 0 :(得分:0)
看起来像来自Quandl或Alpha Vantage的数据(不记得确切的数据,只是我真的讨厌使用该格式)。
假设您已经将JSON响应解码为名为data
的字典:
time_series = data['Time Series (5min)']
df = pd.DataFrame(time_series.values(), time_series.keys())
# Remove the "1. ", "2. ", etc.
df.columns = df.columns.str.extract('\d+\. (.+)', expand=False)
# Convert the index to timestamp
df.index = pd.to_datetime(df.index)
df.index.name = 'timestamp'
df = df.reset_index()