numpy为此积累了正确的东西吗?

时间:2016-06-22 16:54:10

标签: python numpy pandas

我有一个pandas.Series整数看起来像这样:

1959-09-22    191.0
1959-09-23    196.0
1959-09-24    222.0
1959-09-25    232.0
1959-09-28    232.0
1959-09-29    242.0
1959-09-30    241.0
1959-10-01    247.0
1959-10-02    251.0
1959-10-05    275.0
1959-10-06    294.0
1959-10-07    313.0
1959-10-08    332.0
1959-10-09    343.0
1959-10-12    346.0
1959-10-13    344.0
1959-10-14    351.0
1959-10-15    336.0
1959-10-16    330.0
1959-10-19    319.0
1959-10-20    329.0
1959-10-21    356.0
1959-10-22    374.0

我想要做的是处理此列表,以便连续整数之间的差异永远不会小于10%。如果是,则保留最后一个值,直到差值超过当前值的10%。

这种方式最蟒蛇的方式是什么?

作为一些背景,这是一份财务状况清单,其目的是通过不交换总体头寸的微小差异来降低交易成本。

2 个答案:

答案 0 :(得分:1)

这是一种也支持前向填充的方法 -

def process_close_diffs(arr,percent_close,FORWARD_FILL=False):
    # Get thresholds for each element starting from the second element until last
    thresh = (percent_close/100.0)*(arr[:-1])

    # Get the differentiations betwen consecutive elements
    diffs = np.abs(np.diff(arr))    

    # See which elements are more than or equal to the thresh and 
    # select those with with boolean indexing. 
    # Additionally for the optional FORWARD_FILL criteria, use cumulative 
    # summation on the mask to create a replicated kind of array, which when 
    # indexed into the mask selected elements would give us forward-filled array
    mask = np.append(True,diffs >= thresh)
    if FORWARD_FILL:
        return (arr[np.where(mask)[0]])[mask.cumsum()-1]
    else:
        return arr[mask]

示例运行 -

In [215]: data
Out[215]: array([25, 29, 27, 27, 17, 14,  7, 20, 21,  5])

In [216]: process_close_diffs(data,40) # Using 40% to see noticeable changes
Out[216]: array([25,  7, 20,  5])

In [217]: process_close_diffs(data,40,FORWARD_FILL=True)
Out[217]: array([25, 25, 25, 25, 25, 25,  7, 20, 20,  5])

答案 1 :(得分:0)

pandas方法:

s[np.abs(s.pct_change() < 0.1)] = np.nan
s.ffill()

演示:

In [193]: s[np.abs(s.pct_change() < 0.1)] = np.nan

In [194]: s
Out[194]:
date
1959-09-22    191.0
1959-09-23      NaN
1959-09-24    222.0
1959-09-25      NaN
1959-09-28      NaN
1959-09-29      NaN
1959-09-30      NaN
1959-10-01      NaN
1959-10-02      NaN
1959-10-05      NaN
1959-10-06      NaN
1959-10-07      NaN
1959-10-08      NaN
1959-10-09      NaN
1959-10-12      NaN
1959-10-13      NaN
1959-10-14      NaN
1959-10-15      NaN
1959-10-16      NaN
1959-10-19      NaN
1959-10-20      NaN
1959-10-21      NaN
1959-10-22      NaN
Name: val, dtype: float64

In [195]: s.ffill()
Out[195]:
date
1959-09-22    191.0
1959-09-23    191.0
1959-09-24    222.0
1959-09-25    222.0
1959-09-28    222.0
1959-09-29    222.0
1959-09-30    222.0
1959-10-01    222.0
1959-10-02    222.0
1959-10-05    222.0
1959-10-06    222.0
1959-10-07    222.0
1959-10-08    222.0
1959-10-09    222.0
1959-10-12    222.0
1959-10-13    222.0
1959-10-14    222.0
1959-10-15    222.0
1959-10-16    222.0
1959-10-19    222.0
1959-10-20    222.0
1959-10-21    222.0
1959-10-22    222.0
Name: val, dtype: float64