如何使用pandas从以下数据中获取汇总表:
ID Condition Confirmed
D0119 Bad Yes
D0119 Good No
D0117 Bad Yes
D0110 Bad Undefined
D1011 Bad Yes
D1011 Good Yes
D1001 Bad Yes
D1001 Bad Yes
必需的输出:
ID Condition Confirmed %Bad
D0119 Bad,Good Yes, No 50
D0117 Bad,Yes 100
D0110 Bad,Undefined 0
D1011 Bad,Good Yes, Yes
D1001 Bad,Bad Yes, Yes 100
有人可以帮忙吗?感谢
答案 0 :(得分:1)
你可以这样做:
In [123]: (df.assign(Bad=df.Condition=='Bad')
...: .groupby('ID')
...: .agg({'Condition':pd.Series.tolist,
...: 'Confirmed':pd.Series.tolist,
...: 'Bad':'mean'})
...: )
...:
Out[123]:
Bad Condition Confirmed
ID
D0110 1.0 [Bad] [Undefined]
D0117 1.0 [Bad] [Yes]
D0119 0.5 [Bad, Good] [Yes, No]
D1001 1.0 [Bad, Bad] [Yes, Yes]
D1011 0.5 [Bad, Good] [Yes, Yes]
垂直变体:
In [113]: df
Out[113]:
ID Condition Confirmed
0 D0119 Bad Yes
1 D0119 Good No
2 D0117 Bad Yes
3 D0110 Bad Undefined
4 D1011 Bad Yes
5 D1011 Good Yes
6 D1001 Bad Yes
7 D1001 Bad Yes
In [114]: g = df.assign(Bad=df.Condition=='Bad').groupby('ID')
In [115]: df['Bad'] = df['ID'].map((g.sum().div(g.size(), 0)*100).Bad)
In [116]: df
Out[116]:
ID Condition Confirmed Bad
0 D0119 Bad Yes 50.0
1 D0119 Good No 50.0
2 D0117 Bad Yes 100.0
3 D0110 Bad Undefined 100.0
4 D1011 Bad Yes 50.0
5 D1011 Good Yes 50.0
6 D1001 Bad Yes 100.0
7 D1001 Bad Yes 100.0
答案 1 :(得分:1)
考虑以下内容。
import pandas as pd
df = pd.DataFrame({'ID':['D0119', 'D0119', 'D0117', 'D0110', 'D1011', 'D1011', 'D1001', 'D1001'],
'Condition':['Bad', 'Good', 'Bad', 'Bad', 'Bad', 'Good', 'Bad', 'Bad'],
'Confirmed':['Yes', 'No', 'Yes', 'Undefined', 'Yes', 'Yes', 'Yes', 'Yes']})
df_grp = df.loc[df['Confirmed'] != 'Undefined'].groupby('ID')
summary = pd.DataFrame({'Condition':df_grp['Condition'],
'pnt_bad':df_grp['Condition'].apply(lambda x: sum(x=='Bad')/len(x))})
请注意,此方法不会保留仅具有“未定义”状态的记录的外观。