Question

我正在将1个数据框的列与另一列进行比较，并得到一个Indexing Error

我的代码：

##For reading and re-setting index
df1 = pd.read_excel(path) ## 776 line items
df2 = pd.read_excel(path1) ## 10k+ line items

df1.reset_index(inplace = True)
df2.reset_index(inplace = True)

## For comparing two columns
is_expired =df1['Contract Id'].isin(df2['Contract Id']) ## Series of 776 boolean True/False Created successfully
df3=df2.loc[:,is_expired] ## Getting Index error 
df3= df2[is_expired] ## Also tries this, same error

IndexingError：作为索引器（索引）提供了不可对齐的布尔系列系列的索引和索引对象的索引不匹配。

样本DF

Df1

Contract ID     Name
CW 123           A
CW 125           B
                 C
                 D

Df2

Contract ID     Name_1
CW 123           Other
CW 124           Columns
CW 125           Don't
CW 1258          Matter

SO中有很多答案，但是每个人都指出使用.loc，并且不会出现错误。但是我仍然遇到同样的错误。有人可以帮忙吗！

Answer 1

第一个掩码必须与DataFrame匹配才能进行过滤，因此如果比较df1['Contract Id']，则过滤df1：

#test df1['Contract Id']
is_expired = df1['Contract Id'].isin(df2['Contract Id']) 
#filter df1
df3 = df1[is_expired]

或者如果比较df2['Contract Id']过滤器df2：

#test df2['Contract Id']
is_expired = df2['Contract Id'].isin(df1['Contract Id']) 
#filter df2
df3 = df2[is_expired]

为什么您的解决方案失败的原因是不同的掩码和不同的过滤后的DataFrame：

#test df1['Contract Id']
is_expired = df1['Contract Id'].isin(df2['Contract Id'])
#filter df2
df3 = df2.loc[:,is_expired]

比较两个数据框时出现索引错误

1 个答案: