您如何简洁地将两个系列加在一起,而只加上正值?

时间:2018-09-21 10:42:32

标签: python pandas addition

我有两个系列;

energy_dict['QLD'] = 

Timestamp
2017-04-27 00:00:00    523.720765
2017-04-27 01:00:00    512.180608
2017-04-27 02:00:00    519.076642
2017-04-27 03:00:00    516.329201
2017-04-27 04:00:00    525.150158
   ...                 ...
Freq: H, Name: QLD Total Energy (MWh), Length: 8760, dtype: float64

Incoming_Flow = 

Timestamp
2017-04-27 00:00:00    -8.961111
2017-04-27 01:00:00     9.503472
2017-04-27 02:00:00   -10.776389
2017-04-27 03:00:00     1.451389
2017-04-27 04:00:00   -10.388195
        ...               ...

频率:H,名称:METEREDMWFLOW N-Q-MNSP1,长度:8760,dtype:float64

我想将它们加在一起,但仅当第二个大于零时才添加。做这个的最好方式是什么?

我知道我可以做这样的事情;

Incoming_Flow[Incoming_Flow < 0 ] = 0

但是我希望能够一口气做到这一切

3 个答案:

答案 0 :(得分:3)

Series.addSeries.mask一起使用:

s = energy_dict['QLD'].add(Incoming_Flow.mask(Incoming_Flow < 0, 0), fill_value=0)
print (s)
0    523.720765
1    521.684080
2    519.076642
3    517.780590
4    525.150158
dtype: float64

print (Incoming_Flow.mask(Incoming_Flow < 0, 0))
0    0.000000
1    9.503472
2    0.000000
3    1.451389
4    0.000000
Name: METEREDMWFLOW N-Q-MNSP1, dtype: float64

或过滤Series并使用参数fill_value=0

  

填充值:无或浮动值,默认为无(NaN)

     

在计算之前用该值填充现有的缺失(NaN)值以及成功进行系列比对所需的任何新元素。如果两个对应系列位置中的数据均丢失,则结果将丢失

s = energy_dict['QLD'].add(Incoming_Flow[Incoming_Flow > 0], fill_value=0)
print (s)
0    523.720765
1    521.684080
2    519.076642
3    517.780590
4    525.150158
dtype: float64

详细信息

print (Incoming_Flow[Incoming_Flow > 0])
1    9.503472
3    1.451389
Name: METEREDMWFLOW N-Q-MNSP1, dtype: float64

编辑:

如果性能很重要,请使用numpy.where

s = pd.Series(np.where(Incoming_Flow < 0, 0, Incoming_Flow ), index=Incoming_Flow.index)
#if DatetimeIndex values are same in both Series 
s = np.where(Incoming_Flow < 0, 0, Incoming_Flow )
energy_dict['QLD'].add(s, fill_value=0)

答案 1 :(得分:1)

您还可以使用Series.addSeries.where

s = energy_dict['QLD'].add(Incoming_Flow.where(Incoming_Flow.gt(0), 0))

如果性能很重要,这也比mask解决方案快18%:

[证明]

s1 = pd.Series(np.arange(50000))
s2 = pd.Series(np.random.randint(-4, 10,50000))

%timeit s1.add(s2.mask(s2 < 0, 0), fill_value=0)
1.17 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit s1.add(s2[s2 > 0], fill_value=0)
4.68 ms ± 289 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit s1.add(s2.where(s2.gt(0), 0))
988 µs ± 50.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

答案 2 :(得分:0)

使用numpy add和where更快

import numpy as np

qld = [523.720765, 512.180608, 519.076642, 516.329201, 525.150158]
flow = [ -8.961111,   9.503472, -10.776389,   1.451389, -10.388195]

df1 = pd.DataFrame(qld, columns=['QLD'])
df2 = pd.DataFrame(flow, columns=['Incoming_Flow'])

s = np.add(df1['QLD'], np.where(df2['Incoming_Flow'] > 0, df2['Incoming_Flow'], 0))

print(s)

0    523.720765
1    521.684080
2    519.076642
3    517.780590
4    525.150158

时间:

s1 = pd.Series(np.arange(50000))
s2 = pd.Series(np.random.randint(-4, 10,50000))

%timeit s1.add(s2.where(s2.gt(0), 0))
890 µs ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.add(s1, np.where(s2 > 0, s2, 0))
367 µs ± 6.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)