计算一个数据帧中的值在另一个数据帧中重复的次数

时间:2018-06-06 17:33:47

标签: pandas

我有3个数据框说A,B和C,共有一列' com_col'在所有三个数据帧中。我想创建一个名为' com_col_occurrences'在B中应该如下计算。对于数据框B中的' com_col中的每个值,检查该值是否在A中可用。如果它可用,则返回值在A中发生的次数。如果不是,则检查C是否可用,如果是,则返回它重复的次数。请告诉我如何在Pandas中为此编写一个函数。请在下面找到演示该问题的示例代码。

import pandas as pd 

#Given dataframes
df1 = pd.DataFrame({'comm_col': ['A', 'B', 'B', 'A']})

df2 = pd.DataFrame({'comm_col': ['A', 'B', 'C', 'D', 'E']})

df3 = pd.DataFrame({'comm_col':['A', 'A', 'D', 'E']})  

# The value 'A' from df2 occurs in df1 twice. Hence the output is 2. 
#Similarly for 'B' the  output is 2. 'C' doesn't occur in any of the 
#dataframes. Hence the output is 0
# 'D' and 'E' occur don't occur in df1 but occur in df3 once. Hence 
#the output for  'D' and 'E' should be 1

#Output should be as shown below
df2['comm_col_occurrences'] = [2, 2, 0, 1, 1]

Output:

**df1**
         comm_col
0        A
1        B
2        B
3        A

**df3**
         comm_col
0        A
1        A
2        D
3        E

**df2**

         comm_col  
0        A         
1        B         
2        C         
3        D         
4        E  

**Output**
     comm_col  comm_col_occurrences
0        A                     2
1        B                     2
2        C                     0
3        D                     1
4        E                     1

提前致谢

1 个答案:

答案 0 :(得分:0)

你需要:

result = pd.DataFrame({
    'df1':df1['comm_col'].value_counts(),
    'df2':df2['comm_col'].value_counts(),
    'df3':df3['comm_col'].value_counts()
})
result['comm_col_occurrences'] = np.nan
result.loc[result['df1'].notnull(), 'comm_col_occurrences'] = result['df1']
result.loc[result['df3'].notnull(), 'comm_col_occurrences'] = result['df3']
result['comm_col'] = result['comm_col'].fillna(0)
result = result.drop(['df1', 'df2', 'df3'], axis=1)

输出:

    comm_col  comm_col_occurrences
0        A                   2.0
1        B                   2.0
2        C                   0.0
3        D                   1.0
4        E                   1.0