如何比较pandas pivot_table中不同索引的值?

时间:2017-06-21 01:42:15

标签: python pandas pivot-table

数据透视表:

COURSE          ENGLISH       MATH       ART
STUDENT              

StudentA        95.0          83.0       97.0
StudentB        91.0          93.0       47.0
StudentC        85.0          84.0       92.0
StudentD        97.0          84.0       85.0
StudentE        93.0          88.0       85.0
StudentAvg      94.5          83.7       96.9

我想要一个学生列表,其成绩比主题5%StudentAvg。所以在这种情况下,我想要类似的东西:

English: StudentC Math: Art: StudentB, StudentD, StudentE

我怎么能在熊猫中做到这一点?

2 个答案:

答案 0 :(得分:2)

这将返回一个元组列表,显示哪个学生以及哪个科目的成绩比平均成绩低5%以上。

avg = df.loc['StudentAvg', :]
i, j = np.where(((df / avg) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))

[('StudentB', 'ART'),
 ('StudentC', 'ENGLISH'),
 ('StudentC', 'ART'),
 ('StudentD', 'ART'),
 ('StudentE', 'ART')]

我们可以加快一点

p = df.index.get_loc('StudentAvg')
v = df.values
i, j = np.where(((v / v[p]) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))

[('StudentB', 'ART'),
 ('StudentC', 'ENGLISH'),
 ('StudentC', 'ART'),
 ('StudentD', 'ART'),
 ('StudentE', 'ART')]

计时

%%timeit
p = df.index.get_loc('StudentAvg')
v = df.values
i, j = np.where(((v / v[p]) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
10000 loops, best of 3: 41.7 µs per loop

%%timeit
avg = df.loc['StudentAvg', :]
i, j = np.where(((df / avg) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))\
1000 loops, best of 3: 662 µs per loop

答案 1 :(得分:1)

编辑:

df.apply(lambda x: str(x.name)+ ': ' + ', '.join(df[((x-x.loc['StudentAvg'])/x.loc['StudentAvg']*100<-5.0)].index.tolist())).values.tolist()

输出:

['ENGLISH: StudentC', 'MATH: ', 'ART: StudentB, StudentC, StudentD, StudentE']

让我们用这个:

mask = df.apply(lambda x: (x-x.loc['StudentAvg'])/x.loc['StudentAvg']*100<-5.0).any(axis=1)
df[mask].index.tolist()

输出:

['StudentB', 'StudentC', 'StudentD', 'StudentE']