我想在pandas DataFrame中找到2列int类型之间的区别。我正在使用python 2.7。列如下 -
>>> df
INVOICED_QUANTITY QUANTITY_SHIPPED
0 15 NaN
1 20 NaN
2 7 NaN
3 7 NaN
4 7 NaN
现在,我想从INVOICED_QUANTITY&中减去QUANTITY_SHIPPED。我在下面做 -
>>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY']
>>> df
QUANTITY_INVOICED SHIPPED_QUANTITY Diff
0 15 NaN NaN
1 20 NaN NaN
2 7 NaN NaN
3 7 NaN NaN
4 7 NaN NaN
我如何照顾NaN?我希望得到以下结果,因为我希望NaN被视为0(零) -
>>> df
QUANTITY_INVOICED SHIPPED_QUANTITY Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
我不想做df.fillna(0)
。总之,我会尝试类似以下的&它有效但不是差异 -
>>> df['Sum'] = df[['QUANTITY_INVOICED', 'SHIPPED_QUANTITY']].sum(axis=1)
>>> df
INVOICED_QUANTITY QUANTITY_SHIPPED Diff Sum
0 15 NaN NaN 15
1 20 NaN NaN 20
2 7 NaN NaN 7
3 7 NaN NaN 7
4 7 NaN NaN 7
答案 0 :(得分:5)
您可以使用sub
方法执行减法 - 此方法允许将NaN
值视为指定值:
df['Diff'] = df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
产生:
INVOICED_QUANTITY QUANTITY_SHIPPED Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
执行此操作的另一种巧妙方法是@JianxunLi suggests:填写列中的缺失值(创建列的副本)并正常减去。
这两种方法几乎相同,虽然sub
更有效,因为它不需要提前生成列的副本;它只是“在飞行中”填补了缺失值:
In [46]: %timeit df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
10000 loops, best of 3: 144 µs per loop
In [47]: %timeit df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
10000 loops, best of 3: 81.7 µs per loop
答案 1 :(得分:2)
我认为简单地用NaN填0可以帮助你。
df['Diff'] = df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
Out[153]:
INVOICED_QUANTITY QUANTITY_SHIPPED Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7