Python Pandas更新数据帧并计算更新的单元数

时间:2014-04-16 07:30:12

标签: python pandas dataframe

让我们说我用另一个数据帧(df2)

更新我的数据帧
import pandas as pd
import numpy as np

df=pd.DataFrame({'axis1': ['Unix','Window','Apple','Linux'],
                 'A': [1,np.nan,1,1],
                 'B': [1,np.nan,np.nan,1],
                 'C': [np.nan,1,np.nan,1],
                 'D': [1,np.nan,1,np.nan],
                 }).set_index(['axis1'])

print (df)

df2=pd.DataFrame({'axis1': ['Unix','Window','Apple','Linux','A'],
                 'A': [1,1,np.nan,np.nan,np.nan],
                 'E': [1,np.nan,1,1,1],
                 }).set_index(['axis1'])

df = df.reindex(columns=df2.columns.union(df.columns),
                index=df2.index.union(df.index))

df.update(df2)

print (df)

是否有命令来获取更新的单元格数量? (从Nan变为1) 我想用它来跟踪我的数据帧的变化。

1 个答案:

答案 0 :(得分:0)

在我能想到的pandas中没有内置方法,你必须在更新之前保存原始df然后进行比较,诀窍是确保NaN比较被视为与非相同-zero值,这里df3是调用更新前的df副本:

In [104]:

df.update(df2)
df
Out[104]:
         A   B   C   D   E
axis1                     
A      NaN NaN NaN NaN   1
Apple    1 NaN NaN   1   1
Linux    1   1   1 NaN   1
Unix     1   1 NaN   1   1
Window   1 NaN   1 NaN NaN

[5 rows x 5 columns]
In [105]:

df3
Out[105]:
         A   B   C   D   E
axis1                     
A      NaN NaN NaN NaN NaN
Apple    1 NaN NaN   1 NaN
Linux    1   1   1 NaN NaN
Unix     1   1 NaN   1 NaN
Window NaN NaN   1 NaN NaN

[5 rows x 5 columns]
In [106]:

# compare but notice that NaN comparison returns True
df!=df3
Out[106]:
            A      B      C      D     E
axis1                                   
A        True   True   True   True  True
Apple   False   True   True  False  True
Linux   False  False  False   True  True
Unix    False  False   True  False  True
Window   True   True  False   True  True

[5 rows x 5 columns]

In [107]:
# use numpy count_non_zero for easy counting, note this gives wrong result
np.count_nonzero(df!=df3)
Out[107]:
16

In [132]:

~((df == df3) | (np.isnan(df) & np.isnan(df3)))
Out[132]:
            A      B      C      D      E
axis1                                    
A       False  False  False  False   True
Apple   False  False  False  False   True
Linux   False  False  False  False   True
Unix    False  False  False  False   True
Window   True  False  False  False  False

[5 rows x 5 columns]
In [133]:

np.count_nonzero(~((df == df3) | (np.isnan(df) & np.isnan(df3))))
Out[133]:
5