使用Pandas从JSON创建组合数据框

时间:2018-08-03 02:25:30

标签: python json python-3.x pandas dataframe

我正在以JSON格式从Alpha_Vantage导入数据:

{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "MSFT",
        "3. Last Refreshed": "2018-08-02",
        "4. Output Size": "Compact",
        "5. Time Zone": "US/Eastern"
    },
    "Time Series (Daily)": {
        "2018-08-02": {
            "1. open": "105.4000",
            "2. high": "108.0900",
            "3. low": "104.8400",
            "4. close": "107.5700",
            "5. volume": "26080662"
        },...

我想从不同的代码中提取不同的数据,并将日期作为索引和每个代码的“ 4. close”列组合在一起。到目前为止,这就是我所拥有的:

from alpha_vantage.timeseries import TimeSeries
from pprint import pprint

tickers = ['KHC', 'TSLA']
for t in range(len(tickers)):
  ts = TimeSeries(key='my_api_key', output_format='pandas')
   data, meta_data = ts.get_daily(symbol= tickers[t], 
                                    outputsize='compact')
  accu = data['4. close'].head()
  data_merged = data.merge(accu.to_frame(), how='left'\ 
                            , left_on='date' 
                            , right_index=True)

  pprint(data_merged.head)

目前,即使在打印单个行情记录时,键也会出现在left_on中'date'的键错误中。键入另一个键只会弄乱数据。任何想法?另外,如何在每一列的顶部打印代码名称?

1 个答案:

答案 0 :(得分:0)

您需要将获得的“ 4. close”系列收集到字典中,然后从字典中构建一个DataFrame:

tickers = ['KHC', 'TSLA']
ts = TimeSeries(key='my_api_key', output_format='pandas')

closes_4 = {} # Start with an empty dictionary

for t in tickers:
    data, _ = ts.get_daily(symbol=t, outputsize='compact')
    closes_4[t] = data['4. close'] # Add a series to the dict

close_df = pd.DataFrame(closes_4)