Question

我有3个数据框说A，B和C，共有一列＆＃39; com_col＆＃39;在所有三个数据帧中。我想创建一个名为＆＃39; com_col_occurrences＆＃39;在B中应该如下计算。对于数据框B中的＆＃39; com_col中的每个值，检查该值是否在A中可用。如果它可用，则返回值在A中发生的次数。如果不是，则检查C是否可用，如果是，则返回它重复的次数。请告诉我如何在Pandas中为此编写一个函数。请在下面找到演示该问题的示例代码。

import pandas as pd 

#Given dataframes
df1 = pd.DataFrame({'comm_col': ['A', 'B', 'B', 'A']})

df2 = pd.DataFrame({'comm_col': ['A', 'B', 'C', 'D', 'E']})

df3 = pd.DataFrame({'comm_col':['A', 'A', 'D', 'E']})  

# The value 'A' from df2 occurs in df1 twice. Hence the output is 2. 
#Similarly for 'B' the  output is 2. 'C' doesn't occur in any of the 
#dataframes. Hence the output is 0
# 'D' and 'E' occur don't occur in df1 but occur in df3 once. Hence 
#the output for  'D' and 'E' should be 1

#Output should be as shown below
df2['comm_col_occurrences'] = [2, 2, 0, 1, 1]

Output:

**df1**
         comm_col
0        A
1        B
2        B
3        A

**df3**
         comm_col
0        A
1        A
2        D
3        E

**df2**

         comm_col  
0        A         
1        B         
2        C         
3        D         
4        E  

**Output**
     comm_col  comm_col_occurrences
0        A                     2
1        B                     2
2        C                     0
3        D                     1
4        E                     1

提前致谢

Answer 1

你需要：

result = pd.DataFrame({
    'df1':df1['comm_col'].value_counts(),
    'df2':df2['comm_col'].value_counts(),
    'df3':df3['comm_col'].value_counts()
})
result['comm_col_occurrences'] = np.nan
result.loc[result['df1'].notnull(), 'comm_col_occurrences'] = result['df1']
result.loc[result['df3'].notnull(), 'comm_col_occurrences'] = result['df3']
result['comm_col'] = result['comm_col'].fillna(0)
result = result.drop(['df1', 'df2', 'df3'], axis=1)

输出：

    comm_col  comm_col_occurrences
0        A                   2.0
1        B                   2.0
2        C                   0.0
3        D                   1.0
4        E                   1.0

计算一个数据帧中的值在另一个数据帧中重复的次数

1 个答案: