我想在Pandas中将一个数据帧除以另一个数据帧,最终表示百分比变化。两个数据帧值都包含NaN和0.现在,当我将一个数据帧除以另一个时,两个数据帧的值为零的结果是NaN。我知道为什么0/0设置为np.nan,但从百分比变化的角度来看,我需要0/0为0.
实现这一目标的最简洁方法是什么?
重现问题:
import pandas as pd
import numpy as np
data_with_zeros = pd.DataFrame({'a': [2, np.nan, 0, 3], 'b': [np.nan, 2, 0, 6]})
data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0)
结果:
0 inf
1 0.0
2 NaN
3 0.5
dtype: float64
答案 0 :(得分:2)
mask = (data_with_zeros[['a','b']].values == [0,0]).all(1)
data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).where(~mask,0)
或者mask
可以更直观的方式创建,如此 -
mask = (data_with_zeros.a == 0) & (data_with_zeros.b == 0)
样品运行 -
案例#1:
In [66]: data_with_zeros
Out[66]:
a b
0 2.0 NaN
1 NaN 2.0
2 0.0 0.0
3 3.0 6.0
In [67]: mask = (data_with_zeros.a == 0) & (data_with_zeros.b == 0)
In [68]: data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).where(~mask,0)
Out[68]:
0 inf
1 0.000000
2 0.000000
3 0.500000
dtype: float64
案例#2:
In [70]: data_with_zeros
Out[70]:
a b
0 2.0 0.0
1 NaN 2.0
2 0.0 0.0
3 3.0 6.0
In [71]: mask = (data_with_zeros.a == 0) & (data_with_zeros.b == 0)
In [72]: data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).where(~mask,0)
Out[72]:
0 inf
1 0.000000
2 0.000000
3 0.500000
dtype: float64
答案 1 :(得分:1)
替换inf
值
In [61]: data_with_zeros['a'].div(data_with_zeros['b'], fill_value=0).replace({np.inf: 0})
Out[61]:
0 0.0
1 0.0
2 NaN
3 0.5
dtype: float64
答案 2 :(得分:1)
除了Divakar提供的解决方案之外,该函数还使用该方法进行数据帧/数据帧划分:
def divide(a, other, fill_value=None):
serie_name = a.name
mask = ((a == 0) & (other[serie_name] == 0))
result_with_zeros = a.div(other[serie_name], fill_value=fill_value)
result_filled = result_with_zeros.where(~mask,0)
return result_filled
data_with_zeros.apply(divide, args=(data_with_zeros,))
结果:
a b
0 1.0 0.0
1 NaN 1.0
2 0.0 0.0
3 1.0 1.0
答案 3 :(得分:0)
如果缺失值可以视为0,则可以这样做
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [2, np.nan, 0, 3], 'b': [np.nan, 2, 0, 6]})
def percent_change(s1, s2):
# Treat missing values as 0
s3 = s2.fillna(0) / s1.fillna(0) - 1
# If both values are 0, the perenctage change will be 0.
mask = (s1.fillna(0) == 0) & (s2.fillna(0) == 0)
s3[mask] = 0
return s3
df['c'] = percent_change(df['a'], df['b'])
print(df)
输出
a b c
0 2.0 NaN -1.000000
1 NaN 2.0 inf
2 0.0 0.0 0.000000
3 3.0 6.0 1.000000