我有下面的数据框,它有大约200家公司的几个股票价值,我正试图找到一种方法来循环并建立一个新的数据框架,其中包括这些公司'不同的年度特征
Date Symbol Open High Low Close Volume Daily Return
2016-01-04 AAPL 102.61 105.37 102.00 105.35 67281190 0.025703
2016-01-05 AAPL 105.75 105.85 102.41 102.71 55790992 0.019960
2016-12-28 AMZN 776.25 780.00 770.50 772.13 3301025 0.009122
2016-12-29 AMZN 772.40 773.40 760.85 765.15 3158299 0.020377
我尝试过不同的方式,我最接近的是:
stocks_features = pd.DataFrame(data=stocks_data.Symbol.unique(), columns = ['Symbol'])
stocks_features['Max_Yearly_Price'] = stocks_data['High'].max()
stocks_features['Min_Yearly_Price'] = stocks_data['Low'].min()
stocks_features
但它给了我所有股票的相同价值:
Symbol Max_Yearly_Price Min_Yearly_Price
AAPL 847.21 89.47
AMZN 847.21 89.47
我做错了什么,我怎么能做到这一点?
答案 0 :(得分:4)
使用groupby
agg
df.groupby('Symbol').agg({'High':'max','Low':'min'}).\
rename(columns={'High':'Max_Yearly_Price','Low':'Min_Yearly_Price'})
Out[861]:
Max_Yearly_Price Min_Yearly_Price
Symbol
AAPL 105.85 102.00
AMZN 780.00 760.85
答案 1 :(得分:1)
# creates a dictionary of all the symbols and their max values
value_maps = dict(stocks_features.loc[stocks_features.\
groupby('Symbol').High.agg('idxmax')][['Symbol', 'High']].values)
# sets Max_Yearly_Price equal to the symbol
stocks_features['Max_Yearly_Price'] = stocks_features['Symbol']
# replaces the symbol wiht the corresponding value from the dicitonary
stocks_features['Max_Yearly_Price'] = stocks_features['Max_Yearly_Price'].map(value_maps)
# ouput
Date Symbol Open High Low Close Volume Daily Return Max_Yearly_Price
0 2016-01-04 AAPL 102.61 105.37 102.00 105.35 672811900.025703 NaN 105.85
1 2016-01-05 AAPL 105.75 105.85 102.41 102.71 557909920.019960 NaN 105.85
2 2016-12-28 AMZN 776.25 780.00 770.50 772.13 33010250.009122 NaN 780.00
3 2016-12-29 AMZN 772.40 773.40 760.85 765.15 31582990.020377 NaN 780.00