比较数据框并输出计数匹配数

时间:2019-11-07 09:33:21

标签: python pandas

我想计算一个数据帧在另一个数据帧中的出现次数,并输出匹配计数。

df
   group1  group2
0  orange  orange
1   apple   apple
2  banana    pear
3  banana  banana

fruit_df
   fruits
0  orange
1  banana

所以:

 groups = ["group1", "group2"]
 matrix = pd.DataFrame()
 for group in groups:
        out = fruit_df["fruits"].isin(df[group]).astype(int)
        matrix = pd.concat([matrix, out], axis = 1)
 matrix.columns = groups
 matrix = matrix.rename(index = fruit_df["fruits"]) 

结果:

matrix
        group1  group2
orange       1       1
banana       1       1

我想要的是:

 matrix
        group1  group2
 orange      1       1
 banana      2       1

2 个答案:

答案 0 :(得分:3)

这是您可以尝试的方式之一

temp_df = pd.melt(df, var_name='group', value_name='fruits')
temp_df['count'] = 1
df_count = temp_df.pivot_table(index=['fruits'], columns=['group'], values='count', aggfunc=np.sum).reset_index()
matrix = fruits_df.merge(df_count)
matrix.set_index('fruits')
print(matrix)

结果是

enter image description here

答案 1 :(得分:3)

每列使用value_counts,从fruit_df['fruits']DataFrame.loc的值中进行选择,如有必要,将丢失的值替换为0并转换为整数:

df = df.apply(pd.value_counts).loc[fruit_df['fruits']].fillna(0).astype(int)
print (df)
        group1  group2
orange       1       1
banana       2       1