我正在将1个数据框的列与另一列进行比较,并得到一个Indexing Error
我的代码:
##For reading and re-setting index
df1 = pd.read_excel(path) ## 776 line items
df2 = pd.read_excel(path1) ## 10k+ line items
df1.reset_index(inplace = True)
df2.reset_index(inplace = True)
## For comparing two columns
is_expired =df1['Contract Id'].isin(df2['Contract Id']) ## Series of 776 boolean True/False Created successfully
df3=df2.loc[:,is_expired] ## Getting Index error
df3= df2[is_expired] ## Also tries this, same error
IndexingError:作为索引器(索引)提供了不可对齐的布尔系列 系列的索引和索引对象的索引不匹配。
样本DF
Df1
Contract ID Name
CW 123 A
CW 125 B
C
D
Df2
Contract ID Name_1
CW 123 Other
CW 124 Columns
CW 125 Don't
CW 1258 Matter
SO中有很多答案,但是每个人都指出使用.loc
,并且不会出现错误。但是我仍然遇到同样的错误。有人可以帮忙吗!
答案 0 :(得分:1)
第一个掩码必须与DataFrame匹配才能进行过滤,因此如果比较df1['Contract Id']
,则过滤df1
:
#test df1['Contract Id']
is_expired = df1['Contract Id'].isin(df2['Contract Id'])
#filter df1
df3 = df1[is_expired]
或者如果比较df2['Contract Id']
过滤器df2
:
#test df2['Contract Id']
is_expired = df2['Contract Id'].isin(df1['Contract Id'])
#filter df2
df3 = df2[is_expired]
为什么您的解决方案失败的原因是不同的掩码和不同的过滤后的DataFrame:
#test df1['Contract Id']
is_expired = df1['Contract Id'].isin(df2['Contract Id'])
#filter df2
df3 = df2.loc[:,is_expired]