我有两组不同的数据帧。
一个是小组,其items
由股票代表。
以下是获取Panel的代码(用于再现性)
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import datetime as dt
import re
startDate = '2010-01-01'
endDate = '2016-09-07'
stocks_query = ['AAPL','OPK']
stocks = web.DataReader(stocks_query, data_source='yahoo',
start=startDate, end=endDate)
stocks = stocks.swapaxes('items','minor_axis')`
导致输出:
Dimensions: 2 (items) x 1682 (major_axis) x 6 (minor_axis)
Items axis: AAPL to OPK
Major_axis axis: 2010-01-04 00:00:00 to 2016-09-07 00:00:00
Minor_axis axis: Open to Adj Close
面板的单个数据框看起来像这样
stocks['OPK']
Open High Low Close Volume Adj Close log_return \
Date
2010-01-04 1.80 1.97 1.76 1.95 234500.0 1.95 NaN
2010-01-05 1.64 1.95 1.64 1.93 135800.0 1.93 -0.010309
2010-01-06 1.90 1.92 1.77 1.79 546600.0 1.79 -0.075304
2010-01-07 1.79 1.94 1.76 1.92 138700.0 1.92 0.070110
2010-01-08 1.92 1.94 1.86 1.89 62500.0 1.89 -0.015748
然后我通过以下代码添加了几个自定义列:
for i in stocks:
stocks[i]['log_return'] = np.log(stocks[i]['Close']/(stocks[i]['Close'].shift(1)))
stocks[i]['30_Avg_Vol'] = stocks[i] ['Volume'].rolling(min_periods =15, window=30).mean()
然后为了拼接出音量很高的行,我通过这段代码创建了一个数据帧字典(每个键是股票,每个值都是拼接的数据帧)
High_volume ={}
for i in stocks.items: #stocks is a panel, the items are the stocks tickers
print (i)
High_volume[i] =stocks[i][stocks[i].Volume > 1.5* stocks[i]['30_Avg_Vol']]
所以我有一个拼接数据帧的字典,我可以通过股票代码访问每个数据帧。
High_volume['OPK']
High_volume['AAPL']
现在,对于每个High_volume
数据帧的每一行中的每个日期(索引是一个日期时间对象),我想创建一堆迷你数据帧。
因此对于High_volume['AAPL']
中的所有日期,我想为每个日期创建一个mini_dataframe。对于High_volume['OPK']
中的所有日期,我想创建一堆迷你数据帧。所以在这种情况下,我想创建两个包含迷你数据帧的字典。
High_volume['OPK'] looks something like this, for each date I want to create a mini dataframe
Open High Low Close Volume Adj Close \
Date
2010-02-11 1.710000 2.200000 1.710000 1.940000 2212300.0 1.940000
2010-02-12 1.940000 2.100000 1.940000 2.030000 739500.0 2.030000
2010-03-19 2.030000 2.050000 1.950000 2.030000 611800.0 2.030000
2010-04-12 2.060000 2.210000 2.040000 2.160000 647100.0 2.160000
2010-04-13 2.210000 2.450000 2.160000 2.320000 823200.0 2.320000
每个迷你数据帧都有大约X
天的信息。开始日期是行拼接,结束日期约X
天后。要获取X
其他日期的数据,我正在拼接包含所有库存数据的原始面板(stocks
)。
然而,由于我正在处理很多股票,我将不得不在一次迭代中创建许多字典(在这种情况下为两个,OPK
和AAPL
)所以我需要动态命名字典
所以执行此操作的功能看起来像这样
def slicing (stock, sliced_data, num_of_days):
# stocks = list of stock tickers I'm interesting in exploring
#sliced_data = the high_volume dict I created
#num_of_days = this represents the X days (the size of each mini-dataframe)
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] ={} #dynamically creating a dictionary for that stock
print (vars()['mini_dfs' + i]) # to make sure dictionary was created
for date in sliced_data[i].index: #taking each date of High_volume df
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] =stocks[i].loc[start_date:end_date] #
#filling the empty dictionary with dataframes (dates are keys, values are dataframes)
return vars()['mini_dfs' + i] #returning the dictionary before creating the new dictionary
该功能似乎正在执行,因为我正在获得两个股票的一堆mini_dataframes的输出。但是,它没有被保存为两个变量。 它全部被保存到一个变量中。 请记住,在这种情况下,我正在处理两个股票,所以我想要创建两个字典。
x=slicing(['AAPL','OPK'], High_volume , 1) # This works
然而,
x,y =slicing(['AAPL','OPK'], High_volume , 1)
ValueError: too many values to unpack (expected 2)
在这种情况下如何让函数输出两个字典(或每个股票的一个字典,我希望分析)?
感谢。
答案 0 :(得分:1)
问题是return
只给你一个值 - 最后创建的字典。您可以使用yield
生成一系列这样的词典:
def slicing(stock, sliced_data, num_of_days):
time_delta = dt.timedelta(days =num_of_days)
for i in stock: # stock name
vars()['mini_dfs' + i] = {}
for date in sliced_data[i].index:
start_date = date
end_date = date + time_delta
vars()['mini_dfs' + i][date] = stocks[i].loc[start_date:end_date]
yield vars()['mini_dfs' + i]
然后你可以列出这样的词典:
my_list = [i for i in slicing(['AAPL','OPK'], High_volume, 1)]