比较两个数据框时出现索引错误

时间:2020-02-10 11:51:06

标签: python-3.x pandas

我正在将1个数据框的列与另一列进行比较,并得到一个Indexing Error

我的代码

##For reading and re-setting index
df1 = pd.read_excel(path) ## 776 line items
df2 = pd.read_excel(path1) ## 10k+ line items

df1.reset_index(inplace = True)
df2.reset_index(inplace = True)

## For comparing two columns
is_expired =df1['Contract Id'].isin(df2['Contract Id']) ## Series of 776 boolean True/False Created successfully
df3=df2.loc[:,is_expired] ## Getting Index error 
df3= df2[is_expired] ## Also tries this, same error

IndexingError:作为索引器(索引)提供了不可对齐的布尔系列 系列的索引和索引对象的索引不匹配。

样本DF

Df1

Contract ID     Name
CW 123           A
CW 125           B
                 C
                 D

Df2

Contract ID     Name_1
CW 123           Other
CW 124           Columns
CW 125           Don't
CW 1258          Matter

SO中有很多答案,但是每个人都指出使用.loc,并且不会出现错误。但是我仍然遇到同样的错误。有人可以帮忙吗!

1 个答案:

答案 0 :(得分:1)

第一个掩码必须与DataFrame匹配才能进行过滤,因此如果比较df1['Contract Id'],则过滤df1

#test df1['Contract Id']
is_expired = df1['Contract Id'].isin(df2['Contract Id']) 
#filter df1
df3 = df1[is_expired] 

或者如果比较df2['Contract Id']过滤器df2

#test df2['Contract Id']
is_expired = df2['Contract Id'].isin(df1['Contract Id']) 
#filter df2
df3 = df2[is_expired] 

为什么您的解决方案失败的原因是不同的掩码和不同的过滤后的DataFrame:

#test df1['Contract Id']
is_expired = df1['Contract Id'].isin(df2['Contract Id'])
#filter df2
df3 = df2.loc[:,is_expired]