让我们说我用另一个数据帧(df2)
更新我的数据帧import pandas as pd
import numpy as np
df=pd.DataFrame({'axis1': ['Unix','Window','Apple','Linux'],
'A': [1,np.nan,1,1],
'B': [1,np.nan,np.nan,1],
'C': [np.nan,1,np.nan,1],
'D': [1,np.nan,1,np.nan],
}).set_index(['axis1'])
print (df)
df2=pd.DataFrame({'axis1': ['Unix','Window','Apple','Linux','A'],
'A': [1,1,np.nan,np.nan,np.nan],
'E': [1,np.nan,1,1,1],
}).set_index(['axis1'])
df = df.reindex(columns=df2.columns.union(df.columns),
index=df2.index.union(df.index))
df.update(df2)
print (df)
是否有命令来获取更新的单元格数量? (从Nan变为1) 我想用它来跟踪我的数据帧的变化。
答案 0 :(得分:0)
在我能想到的pandas中没有内置方法,你必须在更新之前保存原始df然后进行比较,诀窍是确保NaN
比较被视为与非相同-zero值,这里df3是调用更新前的df副本:
In [104]:
df.update(df2)
df
Out[104]:
A B C D E
axis1
A NaN NaN NaN NaN 1
Apple 1 NaN NaN 1 1
Linux 1 1 1 NaN 1
Unix 1 1 NaN 1 1
Window 1 NaN 1 NaN NaN
[5 rows x 5 columns]
In [105]:
df3
Out[105]:
A B C D E
axis1
A NaN NaN NaN NaN NaN
Apple 1 NaN NaN 1 NaN
Linux 1 1 1 NaN NaN
Unix 1 1 NaN 1 NaN
Window NaN NaN 1 NaN NaN
[5 rows x 5 columns]
In [106]:
# compare but notice that NaN comparison returns True
df!=df3
Out[106]:
A B C D E
axis1
A True True True True True
Apple False True True False True
Linux False False False True True
Unix False False True False True
Window True True False True True
[5 rows x 5 columns]
In [107]:
# use numpy count_non_zero for easy counting, note this gives wrong result
np.count_nonzero(df!=df3)
Out[107]:
16
In [132]:
~((df == df3) | (np.isnan(df) & np.isnan(df3)))
Out[132]:
A B C D E
axis1
A False False False False True
Apple False False False False True
Linux False False False False True
Unix False False False False True
Window True False False False False
[5 rows x 5 columns]
In [133]:
np.count_nonzero(~((df == df3) | (np.isnan(df) & np.isnan(df3))))
Out[133]:
5