我有一个类似于以下所示的数据框,可扩展约20,000行
颜色可以是蓝色,黄色,绿色,红色
值可以是FN,FP,TP,空白
df = pd.DataFrame({'Color': ['Blue', 'Yellow', 'Green','Red','Yellow','Green'],
'BIG': ['FN', ' ', 'FP', ' ', ' ', 'FN'],
'MED': ['FP', ' ', 'FN', ' ', 'TP', ' '],
'SM' : [' ', 'TP', ' ', ' ', ' ', 'FP']}
我想要的是每个组合的计数。
示例:蓝色/大/ TP = 105个计数
| Color |BIG_TP|BIG_FN|BIG_FP|MED_TP|MED_FN|MED_FP|SM_TP|SM_FN|SM_FP|
|:-----:|:----:|:----:|:----:|:----:|:----:|:----:|:---:|:---:|:---:|
|Blue | 105 | 35 | 42 | 199 | 75 | 49 | 115 | 135 | 13 |
|Yellow | 85 | 5 | 23 | 05 | 111 | 68 | 99 | 42 | 42 |
|Green | 365 | 66 | 74 | 35 | 2 | 31 | 207 | 190 | 61 |
|Red | 245 | 3 | 8 | 25 | 7 | 49 | 7 | 55 | 69 |
我尝试过的事情:
color_summary = pd.crosstab(index=[df['Color']], columns= [df['BIG'], df['MED'], df['SM']], values=[df[df['BIG']], df[df['MED']], df[df['SM']]], aggfunc=sum)
这与我要寻找的不是很接近。我确实设法以一种反复无常,令人讨厌的方式重复了很多次。也许正在寻找使用交叉表的更为简洁的解决方案。
test_1 = df['BIG']=='TP'
test_2 = df['BIG']=='FN'
test_3 = df['BIG']=='FP'
sev_tp = pd.crosstab(df['Language'], [df.loc[test_1, 'BIG']])
sev_fn = pd.crosstab(df['Language'], [df.loc[test_2, 'BIG']])
sev_fp = pd.crosstab(df['Language'], [df.loc[test_3, 'BIG']])
big_tp_df = pd.DataFrame(big_tp.to_records())
big_fn_df = pd.DataFrame(big_fn.to_records())
big_fp_df = pd.DataFrame(big_fp.to_records())
Big_TP = pd.Series(big_tp_df.True_Positive.values,index=big_tp_df.Color).to_dict()
Big_FN = pd.Series(big_fn_df.False_Negative.values,index=big_fn_df.Color).to_dict()
Big_FP = pd.Series(big_fp_df.False_Positive.values,index=big_fp_df.Color).to_dict()
a = pd.Series(Big_TP, name='BIG_TP')
b = pd.Series(Big_FN, name='BIG_FN')
c = pd.Series(Big_FP, name='BIG_FP')
a.index.name = 'Color'
b.index.name = 'Color'
c.index.name = 'Color'
a.reset_index()
b.reset_index()
c.reset_index()
color_summary = pd.DataFrame(columns=['Color'])
color_summary['Color'] = big_tp_df['Color']
color_summary = pd.merge(color_summary_summary, a, on='Color')
color_summary = pd.merge(color_summary_summary, b, on='Color')
color_summary = pd.merge(color_summary_summary, c, on='Color')
color_summary.head()
答案 0 :(得分:0)
尝试一下。我已经使用df.unstack
和pd.crosstab
df = pd.DataFrame({'Color': ['Blue', 'Yellow', 'Green','Red','Yellow','Green'],
'BIG': ['FN', ' ', 'FP', ' ', ' ', 'FN'],
'MED': ['FP', ' ', 'FN', ' ', 'TP', ' '],
'SM' : [' ', 'TP', ' ', ' ', ' ', 'FP']} )
#Unstack the dataframe to get 3 columns
ddf = pd.DataFrame(df.set_index('Color').unstack()).reset_index().set_axis(['size','color','f'], axis=1)
#Create crosstab with multiindex columns
ct = pd.crosstab(ddf['color'], [ddf['size'], ddf['f']])
#Concat the multiindexes to a single column
ct.columns = ct.columns.map('_'.join)
#Drop the columns of the type (color, ' ') and only keep (color, 'FN') or (color, 'TP') etc.
out = ct.reset_index().drop(ddf['size'].unique()+'_ ', axis=1)
print(out)
color BIG_FN BIG_FP MED_FN MED_FP MED_TP SM_FP SM_TP
0 Blue 1 0 0 1 0 0 0
1 Green 1 1 1 0 0 1 0
2 Red 0 0 0 0 0 0 0
3 Yellow 0 0 0 0 1 0 1