我有一个pandas.Series
整数看起来像这样:
1959-09-22 191.0
1959-09-23 196.0
1959-09-24 222.0
1959-09-25 232.0
1959-09-28 232.0
1959-09-29 242.0
1959-09-30 241.0
1959-10-01 247.0
1959-10-02 251.0
1959-10-05 275.0
1959-10-06 294.0
1959-10-07 313.0
1959-10-08 332.0
1959-10-09 343.0
1959-10-12 346.0
1959-10-13 344.0
1959-10-14 351.0
1959-10-15 336.0
1959-10-16 330.0
1959-10-19 319.0
1959-10-20 329.0
1959-10-21 356.0
1959-10-22 374.0
我想要做的是处理此列表,以便连续整数之间的差异永远不会小于10%。如果是,则保留最后一个值,直到差值超过当前值的10%。
这种方式最蟒蛇的方式是什么?
作为一些背景,这是一份财务状况清单,其目的是通过不交换总体头寸的微小差异来降低交易成本。
答案 0 :(得分:1)
这是一种也支持前向填充的方法 -
def process_close_diffs(arr,percent_close,FORWARD_FILL=False):
# Get thresholds for each element starting from the second element until last
thresh = (percent_close/100.0)*(arr[:-1])
# Get the differentiations betwen consecutive elements
diffs = np.abs(np.diff(arr))
# See which elements are more than or equal to the thresh and
# select those with with boolean indexing.
# Additionally for the optional FORWARD_FILL criteria, use cumulative
# summation on the mask to create a replicated kind of array, which when
# indexed into the mask selected elements would give us forward-filled array
mask = np.append(True,diffs >= thresh)
if FORWARD_FILL:
return (arr[np.where(mask)[0]])[mask.cumsum()-1]
else:
return arr[mask]
示例运行 -
In [215]: data
Out[215]: array([25, 29, 27, 27, 17, 14, 7, 20, 21, 5])
In [216]: process_close_diffs(data,40) # Using 40% to see noticeable changes
Out[216]: array([25, 7, 20, 5])
In [217]: process_close_diffs(data,40,FORWARD_FILL=True)
Out[217]: array([25, 25, 25, 25, 25, 25, 7, 20, 20, 5])
答案 1 :(得分:0)
s[np.abs(s.pct_change() < 0.1)] = np.nan
s.ffill()
演示:
In [193]: s[np.abs(s.pct_change() < 0.1)] = np.nan
In [194]: s
Out[194]:
date
1959-09-22 191.0
1959-09-23 NaN
1959-09-24 222.0
1959-09-25 NaN
1959-09-28 NaN
1959-09-29 NaN
1959-09-30 NaN
1959-10-01 NaN
1959-10-02 NaN
1959-10-05 NaN
1959-10-06 NaN
1959-10-07 NaN
1959-10-08 NaN
1959-10-09 NaN
1959-10-12 NaN
1959-10-13 NaN
1959-10-14 NaN
1959-10-15 NaN
1959-10-16 NaN
1959-10-19 NaN
1959-10-20 NaN
1959-10-21 NaN
1959-10-22 NaN
Name: val, dtype: float64
In [195]: s.ffill()
Out[195]:
date
1959-09-22 191.0
1959-09-23 191.0
1959-09-24 222.0
1959-09-25 222.0
1959-09-28 222.0
1959-09-29 222.0
1959-09-30 222.0
1959-10-01 222.0
1959-10-02 222.0
1959-10-05 222.0
1959-10-06 222.0
1959-10-07 222.0
1959-10-08 222.0
1959-10-09 222.0
1959-10-12 222.0
1959-10-13 222.0
1959-10-14 222.0
1959-10-15 222.0
1959-10-16 222.0
1959-10-19 222.0
1959-10-20 222.0
1959-10-21 222.0
1959-10-22 222.0
Name: val, dtype: float64