我可以通过比较两个DataFrame并将其连接到一个新的DataFrame中来找到差异,但是当其中一个DataFrame中的值丢失时会出现一个问题:ValueError: Can only compare identically-labeled Series objects
我认为标头索引存在问题。如果您能帮助我,那就太好了。
df1 have one missing value at column 1980
df1
Country 1980 1981 1982 1983 1984
Bermuda 0.00687 0.00727 0.00971 0.00752
Canada 9.6947 9.58952 9.20637 9.18989 9.78546
Greenland 7 0.00746 0.00722 0.00505 0.00799
Mexico 3.72819 4.11969 4.33477 4.06414 4.18464
df2
Country 1980 1981 1982 1983 1984
Bermuda 0.77777 0.00687 0.00727 0.00971 0.00752
Canada 9.6947 9.58952 9.20637 9.18989 9.78546
Greenland 0.00791 0.00746 0.00722 0.00505 0.00799
Mexico 3.72819 4.11969 4.33477 4.06414 4.18464
def process_df(df):
res = df.set_index('Country').stack()
res.index.rename('Column', level=1, inplace=True)
return res
df1 = process_df(df1)
df2 = process_df(df2)
mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
df3 = pd.concat([df1[mask], df2[mask]], axis=1).rename({0:'From', 1:'To'}, axis=1)
print(df3)
我想像空白一样显示缺少的值,如下例:
From To
Country Column
Bermuda 1980 0.77777
Greenland 1980 0.00791 7
请记住,如果没有缺失值,但是我希望能够处理和缺失值,代码可以正常工作。谢谢