如何在零分析中使零吞噬结果为零?

时间:2016-12-29 09:14:17

标签: python pandas numpy

我想在Pandas中将一个数据帧除以另一个数据帧,最终表示百分比变化。两个数据帧值都包含NaN和0.现在,当我将一个数据帧除以另一个时,两个数据帧的值为零的结果是NaN。我知道为什么0/0设置为np.nan,但从百分比变化的角度来看,我需要0/0为0.

实现这一目标的最简洁方法是什么?

重现问题:

import pandas as pd
import numpy as np

data_with_zeros = pd.DataFrame({'a': [2, np.nan, 0, 3], 'b': [np.nan, 2, 0, 6]})

data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0)

结果:

0         inf
1         0.0
2         NaN
3         0.5
dtype: float64

4 个答案:

答案 0 :(得分:2)

这是dataframe.where method -

的方法
mask = (data_with_zeros[['a','b']].values == [0,0]).all(1)
data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).where(~mask,0)

或者mask可以更直观的方式创建,如此 -

mask = (data_with_zeros.a == 0) & (data_with_zeros.b == 0)

样品运行 -

案例#1:

In [66]: data_with_zeros
Out[66]: 
     a    b
0  2.0  NaN
1  NaN  2.0
2  0.0  0.0
3  3.0  6.0

In [67]: mask = (data_with_zeros.a == 0) & (data_with_zeros.b == 0)

In [68]: data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).where(~mask,0)
Out[68]: 
0         inf
1    0.000000
2    0.000000
3    0.500000
dtype: float64

案例#2:

In [70]: data_with_zeros
Out[70]: 
     a    b
0  2.0  0.0
1  NaN  2.0
2  0.0  0.0
3  3.0  6.0

In [71]: mask = (data_with_zeros.a == 0) & (data_with_zeros.b == 0)

In [72]: data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).where(~mask,0)
Out[72]: 
0         inf
1    0.000000
2    0.000000
3    0.500000
dtype: float64

答案 1 :(得分:1)

替换inf

In [61]: data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).replace({np.inf: 0})
Out[61]:
0    0.0
1    0.0
2    NaN
3    0.5
dtype: float64

答案 2 :(得分:1)

除了Divakar提供的解决方案之外,该函数还使用该方法进行数据帧/数据帧划分:

def divide(a, other, fill_value=None):

    serie_name = a.name
    mask = ((a == 0) & (other[serie_name] == 0))
    result_with_zeros = a.div(other[serie_name], fill_value=fill_value)
    result_filled = result_with_zeros.where(~mask,0)

    return result_filled

data_with_zeros.apply(divide, args=(data_with_zeros,))

结果:

    a   b
0   1.0 0.0
1   NaN 1.0
2   0.0 0.0
3   1.0 1.0

答案 3 :(得分:0)

如果缺失值可以视为0,则可以这样做

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [2, np.nan, 0, 3], 'b': [np.nan, 2, 0, 6]})

def percent_change(s1, s2):
    # Treat missing values as 0
    s3 = s2.fillna(0) / s1.fillna(0) - 1
    # If both values are 0, the perenctage change will be 0.
    mask = (s1.fillna(0) == 0) & (s2.fillna(0) == 0)
    s3[mask] = 0
    return s3


df['c'] = percent_change(df['a'], df['b'])
print(df)

输出

     a    b         c
0  2.0  NaN -1.000000
1  NaN  2.0       inf
2  0.0  0.0  0.000000
3  3.0  6.0  1.000000