根据多级数据框中另一列的标准从一列中获取数据

时间:2017-05-09 11:10:05

标签: python python-3.x pandas numpy

使用多级数据框显示产品alfa-delta的价格和因素,我尝试使用具有最高因子的两种产品的平均价格创建新数据集。例如,如果阿尔法和布拉沃的阿尔法和布拉沃的平均价格是最高的因素。

import pandas as pd
import numpy as np
​
index = [np.array(['price', 'price', 'price', 'price', 'factor', 'factor', 'factor', 'factor']),
         np.array(['alfa', 'bravo', 'charlie', 'delta', 'alfa', 'bravo', 'charlie', 'delta'])]
df = pd.DataFrame(np.random.randn(3, 8), index=['2014', '2015', '2016'], columns=index)
​
df

Out[1]:
                                            price                              factor
             alfa   bravo        charlie    delta       alfa        bravo        charlie    delta
2014    -1.078024   -2.370577   1.809694    0.937910    0.643634    -1.167022   -0.013712   0.026595
2015    -0.374975   1.459360    0.875787    -1.407601   -1.220319   0.604929    0.414953    0.053431
2016    -0.265826   1.261522    0.839443    -0.144880   0.157955    -1.050584   -0.909444   0.687804

1 个答案:

答案 0 :(得分:0)

您可以使用:

np.random.seed(123)
np.random.seed(123)
index = [['price'] * 4 + ['factor'] * 4, ['alfa','bravo','charlie','delta'] * 2]
df = pd.DataFrame(np.random.rand(3,8), index=['2014', '2015', '2016'], columns=index)
#print (df)
dff = df.xs('factor', axis=1, level=0)
print (dff)
          alfa     bravo   charlie     delta
2014  0.719469  0.423106  0.980764  0.684830
2015  0.438572  0.059678  0.398044  0.737995
2016  0.634401  0.849432  0.724455  0.611024

a = (np.argsort(-dff.values, axis=1)[:, :2])
print (a)
[[2 0]
 [3 0]
 [1 2]]

#check columns with highest values
print (dff.columns[a])
Index([['charlie', 'alfa'], ['delta', 'alfa'], ['bravo', 'charlie']], dtype='object')
dfp = df.xs('price', axis=1, level=0)
print (dfp)
          alfa     bravo   charlie     delta
2014  0.696469  0.286139  0.226851  0.551315
2015  0.480932  0.392118  0.343178  0.729050
2016  0.182492  0.175452  0.531551  0.531828

b = dfp.values[np.arange(len(df.index))[:,None], a][:,:2]
print (b)
[[ 0.22685145  0.69646919]
 [ 0.72904971  0.4809319 ]
 [ 0.17545176  0.53155137]]

c = pd.Series(np.mean(b, axis=1), index=df.index)
print (c)
2014    0.461660
2015    0.604991
2016    0.353502
dtype: float64