数据透视表:
COURSE ENGLISH MATH ART
STUDENT
StudentA 95.0 83.0 97.0
StudentB 91.0 93.0 47.0
StudentC 85.0 84.0 92.0
StudentD 97.0 84.0 85.0
StudentE 93.0 88.0 85.0
StudentAvg 94.5 83.7 96.9
我想要一个学生列表,其成绩比主题5%
低StudentAvg
。所以在这种情况下,我想要类似的东西:
English: StudentC
Math:
Art: StudentB, StudentD, StudentE
我怎么能在熊猫中做到这一点?
答案 0 :(得分:2)
这将返回一个元组列表,显示哪个学生以及哪个科目的成绩比平均成绩低5%以上。
avg = df.loc['StudentAvg', :]
i, j = np.where(((df / avg) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
[('StudentB', 'ART'),
('StudentC', 'ENGLISH'),
('StudentC', 'ART'),
('StudentD', 'ART'),
('StudentE', 'ART')]
我们可以加快一点
p = df.index.get_loc('StudentAvg')
v = df.values
i, j = np.where(((v / v[p]) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
[('StudentB', 'ART'),
('StudentC', 'ENGLISH'),
('StudentC', 'ART'),
('StudentD', 'ART'),
('StudentE', 'ART')]
计时
%%timeit
p = df.index.get_loc('StudentAvg')
v = df.values
i, j = np.where(((v / v[p]) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
10000 loops, best of 3: 41.7 µs per loop
%%timeit
avg = df.loc['StudentAvg', :]
i, j = np.where(((df / avg) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))\
1000 loops, best of 3: 662 µs per loop
答案 1 :(得分:1)
df.apply(lambda x: str(x.name)+ ': ' + ', '.join(df[((x-x.loc['StudentAvg'])/x.loc['StudentAvg']*100<-5.0)].index.tolist())).values.tolist()
输出:
['ENGLISH: StudentC', 'MATH: ', 'ART: StudentB, StudentC, StudentD, StudentE']
让我们用这个:
mask = df.apply(lambda x: (x-x.loc['StudentAvg'])/x.loc['StudentAvg']*100<-5.0).any(axis=1)
df[mask].index.tolist()
输出:
['StudentB', 'StudentC', 'StudentD', 'StudentE']