我有3个数据框说A,B和C,共有一列' com_col'在所有三个数据帧中。我想创建一个名为' com_col_occurrences'在B中应该如下计算。对于数据框B中的' com_col中的每个值,检查该值是否在A中可用。如果它可用,则返回值在A中发生的次数。如果不是,则检查C是否可用,如果是,则返回它重复的次数。请告诉我如何在Pandas中为此编写一个函数。请在下面找到演示该问题的示例代码。
import pandas as pd
#Given dataframes
df1 = pd.DataFrame({'comm_col': ['A', 'B', 'B', 'A']})
df2 = pd.DataFrame({'comm_col': ['A', 'B', 'C', 'D', 'E']})
df3 = pd.DataFrame({'comm_col':['A', 'A', 'D', 'E']})
# The value 'A' from df2 occurs in df1 twice. Hence the output is 2.
#Similarly for 'B' the output is 2. 'C' doesn't occur in any of the
#dataframes. Hence the output is 0
# 'D' and 'E' occur don't occur in df1 but occur in df3 once. Hence
#the output for 'D' and 'E' should be 1
#Output should be as shown below
df2['comm_col_occurrences'] = [2, 2, 0, 1, 1]
Output:
**df1**
comm_col
0 A
1 B
2 B
3 A
**df3**
comm_col
0 A
1 A
2 D
3 E
**df2**
comm_col
0 A
1 B
2 C
3 D
4 E
**Output**
comm_col comm_col_occurrences
0 A 2
1 B 2
2 C 0
3 D 1
4 E 1
提前致谢
答案 0 :(得分:0)
你需要:
result = pd.DataFrame({
'df1':df1['comm_col'].value_counts(),
'df2':df2['comm_col'].value_counts(),
'df3':df3['comm_col'].value_counts()
})
result['comm_col_occurrences'] = np.nan
result.loc[result['df1'].notnull(), 'comm_col_occurrences'] = result['df1']
result.loc[result['df3'].notnull(), 'comm_col_occurrences'] = result['df3']
result['comm_col'] = result['comm_col'].fillna(0)
result = result.drop(['df1', 'df2', 'df3'], axis=1)
输出:
comm_col comm_col_occurrences
0 A 2.0
1 B 2.0
2 C 0.0
3 D 1.0
4 E 1.0