我有两个系列;
energy_dict['QLD'] =
Timestamp
2017-04-27 00:00:00 523.720765
2017-04-27 01:00:00 512.180608
2017-04-27 02:00:00 519.076642
2017-04-27 03:00:00 516.329201
2017-04-27 04:00:00 525.150158
... ...
Freq: H, Name: QLD Total Energy (MWh), Length: 8760, dtype: float64
和
Incoming_Flow =
Timestamp
2017-04-27 00:00:00 -8.961111
2017-04-27 01:00:00 9.503472
2017-04-27 02:00:00 -10.776389
2017-04-27 03:00:00 1.451389
2017-04-27 04:00:00 -10.388195
... ...
频率:H,名称:METEREDMWFLOW N-Q-MNSP1,长度:8760,dtype:float64
我想将它们加在一起,但仅当第二个大于零时才添加。做这个的最好方式是什么?
我知道我可以做这样的事情;
Incoming_Flow[Incoming_Flow < 0 ] = 0
但是我希望能够一口气做到这一切
答案 0 :(得分:3)
将Series.add
与Series.mask
一起使用:
s = energy_dict['QLD'].add(Incoming_Flow.mask(Incoming_Flow < 0, 0), fill_value=0)
print (s)
0 523.720765
1 521.684080
2 519.076642
3 517.780590
4 525.150158
dtype: float64
print (Incoming_Flow.mask(Incoming_Flow < 0, 0))
0 0.000000
1 9.503472
2 0.000000
3 1.451389
4 0.000000
Name: METEREDMWFLOW N-Q-MNSP1, dtype: float64
或过滤Series并使用参数fill_value=0
:
填充值:无或浮动值,默认为无(NaN)
在计算之前用该值填充现有的缺失(NaN)值以及成功进行系列比对所需的任何新元素。如果两个对应系列位置中的数据均丢失,则结果将丢失
s = energy_dict['QLD'].add(Incoming_Flow[Incoming_Flow > 0], fill_value=0)
print (s)
0 523.720765
1 521.684080
2 519.076642
3 517.780590
4 525.150158
dtype: float64
详细信息:
print (Incoming_Flow[Incoming_Flow > 0])
1 9.503472
3 1.451389
Name: METEREDMWFLOW N-Q-MNSP1, dtype: float64
编辑:
如果性能很重要,请使用numpy.where
:
s = pd.Series(np.where(Incoming_Flow < 0, 0, Incoming_Flow ), index=Incoming_Flow.index)
#if DatetimeIndex values are same in both Series
s = np.where(Incoming_Flow < 0, 0, Incoming_Flow )
energy_dict['QLD'].add(s, fill_value=0)
答案 1 :(得分:1)
您还可以使用Series.add
和Series.where
:
s = energy_dict['QLD'].add(Incoming_Flow.where(Incoming_Flow.gt(0), 0))
如果性能很重要,这也比mask
解决方案快18%:
[证明]
s1 = pd.Series(np.arange(50000))
s2 = pd.Series(np.random.randint(-4, 10,50000))
%timeit s1.add(s2.mask(s2 < 0, 0), fill_value=0)
1.17 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit s1.add(s2[s2 > 0], fill_value=0)
4.68 ms ± 289 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit s1.add(s2.where(s2.gt(0), 0))
988 µs ± 50.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
答案 2 :(得分:0)
使用numpy add和where更快
import numpy as np
qld = [523.720765, 512.180608, 519.076642, 516.329201, 525.150158]
flow = [ -8.961111, 9.503472, -10.776389, 1.451389, -10.388195]
df1 = pd.DataFrame(qld, columns=['QLD'])
df2 = pd.DataFrame(flow, columns=['Incoming_Flow'])
s = np.add(df1['QLD'], np.where(df2['Incoming_Flow'] > 0, df2['Incoming_Flow'], 0))
print(s)
0 523.720765
1 521.684080
2 519.076642
3 517.780590
4 525.150158
时间:
s1 = pd.Series(np.arange(50000))
s2 = pd.Series(np.random.randint(-4, 10,50000))
%timeit s1.add(s2.where(s2.gt(0), 0))
890 µs ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.add(s1, np.where(s2 > 0, s2, 0))
367 µs ± 6.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)