输入解释:
我有两个数据框df1
和df2
,它们分别包含如下所述的列。
df1
Description Col1 Col2
AAA 1.2 2.5
BBB 1.3 2.0
CCC 1.1 2.3
df2
Description Col1 Col2
AAA 1.2 1.3
BBB 1.3 2.0
场景:
当两者相等时,必须比较df1['Description']
和df2['Description']
,然后必须将df1['Col1']
与df2['Col1']
和df1['Col2']
与df2['Col2']
比较,并按预期产生结果在下面。
预期输出:
Description Col1 Col2 Col1_Result Col2_Result
AAA 1.2 2.5 Pass Fail
BBB 1.3 2.0 Pass Pass
CCC 1.1 2.3 Not found in df2 Not found in df2
尝试代码: 已经针对上述情况尝试了以下提到的代码行,但是不起作用。通过错误“ ValueError:只能比较标记相同的Series对象”
df1['Col1_Result'] = np.where(df1['Description']== df2['Description'],np.where(df1['Col1'] == df2['Col1'], 'Pass', 'Fail'),'Not found in df2')
df1['Col2_Result'] = np.where(df1['Description']== df2['Description'],np.where(df1['Col2'] == df2['Col2'], 'Pass', 'Fail'),'Not found in df2')
预先感谢!
答案 0 :(得分:2)
或者,下面的代码适用于给定的示例。如果存在边缘情况,可以根据需要进行修改。
# Import libraries
import pandas as pd
# Create DataFrame
df1 = pd.DataFrame({
'Description':['AaA', 'BBB','CCC'],
'Col1': [1.2,1.3,1.1],
'Col2':[2.5,2.0,2.3]
})
df2 = pd.DataFrame({
'Description': ['AAA', 'BBB'],
'Col1': [1.2, 1.3],
'Col2': [1.3, 2.0]
})
# Convert to lower case
df1['Description'] = df1['Description'].str.lower()
df2['Description'] = df2['Description'].str.lower()
# Merge df
df = df1.merge(df2, on='Description', how='left')
# Compare
df['Col1_result'] = df.apply(lambda x: 'Not found in df2' if (pd.isna(x['Col1_y'])) else
'Pass' if x['Col1_x']==x['Col1_y'] else
'Fail', axis=1)
df['Col2_result'] = df.apply(lambda x: 'Not found in df2' if (pd.isna(x['Col2_y'])) else
'Pass' if x['Col2_x']==x['Col2_y'] else
'Fail', axis=1)
# Keep only columns from df1
df = df.drop(['Col1_y', 'Col2_y'], axis=1)
# Remove '_x' from column names
df.columns = df.columns.str.replace(r'_x$', '')
# Change to upper case
df['Description'] = df['Description'].str.upper()
输出
df
Description Col1 Col2 Col1_result Col2_result
0 AAA 1.2 2.5 Pass Fail
1 BBB 1.3 2.0 Pass Pass
2 CCC 1.1 2.3 Not found in df2 Not found in df2
答案 1 :(得分:2)
将DataFrame.merge
与左联接一起用于输出DataFrame,然后按DataFrame.filter
选择添加的列,并通过首先比较缺失值的值来创建输出,然后在numpy.select
中彼此列:>
df1['desc'] = df1['Description'].str.lower()
df2['desc'] = df2['Description'].str.lower()
df = (df1.merge(df2, on='desc', suffixes=['', '_Result'], how='left')
.drop(['Description_Result','desc'], axis=1))
df3 = df.filter(like='_Result')
new = df3.rename(columns=lambda x: x.replace('_Result',''))
df[df3.columns] = np.select([new.isna(),
df[new.columns].eq(new)],
['Not found in df2', 'Pass'], 'Fail')
print (df)
Description Col1 Col2 Col1_Result Col2_Result
0 AAA 1.2 2.5 Pass Fail
1 BBB 1.3 2.0 Pass Pass
2 CCC 1.1 2.3 Not found in df2 Not found in df2
详细信息:
print (df3)
Col1_Result Col2_Result
0 1.2 1.3
1 1.3 2.0
2 NaN NaN
print (new)
Col1 Col2
0 1.2 1.3
1 1.3 2.0
2 NaN NaN