使用多级数据框显示产品alfa-delta的价格和因素,我尝试使用具有最高因子的两种产品的平均价格创建新数据集。例如,如果阿尔法和布拉沃的阿尔法和布拉沃的平均价格是最高的因素。
import pandas as pd
import numpy as np
index = [np.array(['price', 'price', 'price', 'price', 'factor', 'factor', 'factor', 'factor']),
np.array(['alfa', 'bravo', 'charlie', 'delta', 'alfa', 'bravo', 'charlie', 'delta'])]
df = pd.DataFrame(np.random.randn(3, 8), index=['2014', '2015', '2016'], columns=index)
df
Out[1]:
price factor
alfa bravo charlie delta alfa bravo charlie delta
2014 -1.078024 -2.370577 1.809694 0.937910 0.643634 -1.167022 -0.013712 0.026595
2015 -0.374975 1.459360 0.875787 -1.407601 -1.220319 0.604929 0.414953 0.053431
2016 -0.265826 1.261522 0.839443 -0.144880 0.157955 -1.050584 -0.909444 0.687804
答案 0 :(得分:0)
您可以使用:
xs
按级别选择DataFrame
numpy.argsort
表示最高值的索引values by indices
numpy.mean
与Series
。np.random.seed(123)
np.random.seed(123)
index = [['price'] * 4 + ['factor'] * 4, ['alfa','bravo','charlie','delta'] * 2]
df = pd.DataFrame(np.random.rand(3,8), index=['2014', '2015', '2016'], columns=index)
#print (df)
dff = df.xs('factor', axis=1, level=0)
print (dff)
alfa bravo charlie delta
2014 0.719469 0.423106 0.980764 0.684830
2015 0.438572 0.059678 0.398044 0.737995
2016 0.634401 0.849432 0.724455 0.611024
a = (np.argsort(-dff.values, axis=1)[:, :2])
print (a)
[[2 0]
[3 0]
[1 2]]
#check columns with highest values
print (dff.columns[a])
Index([['charlie', 'alfa'], ['delta', 'alfa'], ['bravo', 'charlie']], dtype='object')
dfp = df.xs('price', axis=1, level=0)
print (dfp)
alfa bravo charlie delta
2014 0.696469 0.286139 0.226851 0.551315
2015 0.480932 0.392118 0.343178 0.729050
2016 0.182492 0.175452 0.531551 0.531828
b = dfp.values[np.arange(len(df.index))[:,None], a][:,:2]
print (b)
[[ 0.22685145 0.69646919]
[ 0.72904971 0.4809319 ]
[ 0.17545176 0.53155137]]
c = pd.Series(np.mean(b, axis=1), index=df.index)
print (c)
2014 0.461660
2015 0.604991
2016 0.353502
dtype: float64