Question

我需要rolling_product函数或expanding_product函数。

有各种pandas rolling_XXXX和expanding_XXXX函数，但我很惊讶地发现缺少expanding_product()函数。

为了让事情顺利进行，我一直在使用这种相当慢的替代方案

pd.expanding_apply(temp_col, lambda x : x.prod())

我的阵列通常有32,000个元素，所以这被证明是一个瓶颈。我很想尝试log()，cumsum()和exp()，但我想我应该问一下，因为可能有更好的解决方案。

Answer 1

早期结果表明，这是对expand_product

的快速近似

<?xml version='1.0' encoding='utf-8' ?>
<widget id="io.cordova.hellocordova" version="0.0.1" xmlns="http://www.w3.org/ns/widgets"
xmlns:cdv="http://cordova.apache.org/ns/1.0">
  <name>HelloCordova</name>
  <description>
    A sample Apache Cordova application that responds to the deviceready event.
  </description>
  <author email="dev@cordova.apache.org" href="http://cordova.io">
    Apache Cordova Team
  </author>
  <content src="index.html" />
  <plugin name="cordova-plugin-whitelist" version="1" />
  <access origin="*" />
  <allow-intent href="http://*/*" />
  <allow-intent href="https://*/*" />
  <allow-intent href="tel:*" />
  <allow-intent href="sms:*" />
  <allow-intent href="mailto:*" />
  <allow-intent href="geo:*" />
  <platform name="android">
    <allow-intent href="market:*" />
  </platform>
  <platform name="ios">
    <allow-intent href="itms:*" />
    <allow-intent href="itms-apps:*" />
  </platform>
</widget>

rolling_product需要重复划分，这可能会导致数值不稳定（正如@AmiTavory在现在删除的答案中所指出的那样）

Answer 2

我有一个更快的机制，但你需要运行一些测试来确定准确性是否足够。

这是原始的exp / sum / log版本：

def rolling_prod1(xs, n):
    return np.exp(pd.rolling_sum(np.log(xs), n))

这是一个采用累积产品的版本，将其转移（预先填写nans），然后将其分解。

def rolling_prod2(xs, n):
    cxs = np.cumprod(xs)
    nans = np.empty(n)
    nans[:] = np.nan
    nans[n-1] = 1.
    a = np.concatenate((nans, cxs[:len(cxs)-n]))
    return cxs / a

这两个函数在此示例中返回相同的结果：

In [9]: xs
Out[9]: array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [10]: rolling_prod1(xs, 3)
Out[10]: array([  nan,   nan,    6.,   24.,   60.,  120.,  210.,  336.,  504.])

In [11]: rolling_prod2(xs, 3)
Out[11]: array([  nan,   nan,    6.,   24.,   60.,  120.,  210.,  336.,  504.])

但第二个版本要快得多：

In [12]: temp_col = np.random.rand(30000)

In [13]: %timeit rolling_prod1(temp_col, 3)
1000 loops, best of 3: 694 µs per loop

In [14]: %timeit rolling_prod2(temp_col, 3)
10000 loops, best of 3: 162 µs per loop

快速numpy rolling_product

2 个答案: