Question

我想查询特定点的指数加权移动平均线的值。执行此操作的低效方法如下。 l是事件发生的时间列表，queries表示我想要此平均值的时间。

a=0.01
l = [3,7,10,20,200]
y = [0]*1000
for item in l:
        y[int(item)]=1
s = [0]*1000
for i in xrange(1,1000):
    s[i] = a*y[i-1]+(1-a)*s[i-1]

queries = [23,68,103]

for q in queries:
        print s[q]

输出：

0.0355271185019
0.0226018371526
0.0158992102478

在实践中，l会非常大，l中的值范围也会很大。如何更有效地在queries中找到值，尤其是在没有明确计算潜在巨大列表y和s的情况下。我需要它在纯python中，所以我可以使用pypy。

是否有可能在与len(l)成比例的时间内解决问题而不是max(l)（假设len(queries) < len(l)）？

Answer 1

我认为你可以在ln（l）时间内完成它，如果l是排序的。基本思想是EMA的非递归形式是* s_i +（1-a）^ 1 * s_（i-1）+（1-a）^ 2 * s_（i-2）.... < / p>

这意味着对于查询k，您会发现l小于k的最大数字，并且对于估计限制，请使用以下内容，其中v是l中的索引，l [v]是值

（1-a）^（k-v）* l [v] + ....

然后，您在搜索中花费lg（len（l））时间+估算深度的常数倍。如果你需要的话，我会稍微提供一些代码示例（下班后），只是想在我考虑的时候把我的想法拿到那里

这是代码 - v是给定时间的值字典;如果每次只有1，则替换为1 ......

import math
from bisect import bisect_right

a = .01
limit = 1000
l = [1,5,14,29...]

def find_nearest_lt(l, time):
    i = bisect_right(a, x)
    if i:
        return i-1
    raise ValueError

def find_ema(l, time):
    i = find_nearest_lt(l, time)
    if l[i] == time:
        result = a * v[l[i]
        i -= 1
    else:
        result = 0
    while (time-l[i]) < limit:
        result += math.pow(1-a, time-l[i]) * v[l[i]]
        i -= 1
    return result

如果我正确思考，最近的查找是l（n），那么while循环是＆lt; = 1000次迭代，保证，所以它在技术上是一个常数（虽然是一种大的）。 find_nearest在bisect - http://docs.python.org/2/library/bisect.html

页面上被盗

Answer 2

以下是我执行此操作的代码：

def ewma(l, queries, a=0.01):
  def decay(t0, x, t1, a):
    from math import pow
    return pow((1-a), (t1-t0))*x

  assert l == sorted(l)
  assert queries == sorted(queries)

  samples = []
  try:
    t0, x0 = (0.0, 0.0)
    it = iter(queries)
    q = it.next()-1.0

    for t1 in l:
      # new value is decayed previous value, plus a
      x1 = decay(t0, x0, t1, a) + a
      # take care of all queries between t0 and t1
      while q < t1:
        samples.append(decay(t0, x0, q, a))
        q = it.next()-1.0
      # take care of all queries equal to t1
      while q == t1:
        samples.append(x1)
        q = it.next()-1.0
      # update t0, x0
      t0, x0 = t1, x1

    # take care of any remaining queries
    while True:
      samples.append(decay(t0, x0, q, a))
      q = it.next()-1.0
  except StopIteration:
    return samples

我还上传了这个代码的更全面的版本，包括单元测试和一些评论到pastebin：http://pastebin.com/shhaz710

编辑：请注意，这与Chris Pak在他的回答中提出的内容完全相同，他在输入时必须发布这个内容。我没有详细介绍他的代码，但我认为我的代码更为通用。此代码支持l和queries中的非整数值。它也适用于任何类型的迭代，而不仅仅是列表，因为我没有做任何索引。

Answer 3

y似乎是二进制值 - 0或1 - 取决于l的值。为什么不使用y = set(int(item) for item in l)？这是存储和查找数字列表的最有效方式。

第一次通过此循环时，您的代码将导致错误：

s = [0]*1000
for i in xrange(1000):
    s[i] = a*y[i-1]+(1-a)*s[i-1]

因为i-1是-1，当i = 0（第一次循环）时，y[-1]和s[-1]都是列表的最后一个元素，而不是之前的元素。也许你想要xrange(1,1000)？

这段代码怎么样：

a=0.01
l = [3.0,7.0,10.0,20.0,200.0]
y = set(int(item) for item in l)
queries = [23,68,103]

ewma = []
x = 1 if (0 in y) else 0
for i in xrange(1, queries[-1]):
    x = (1-a)*x
    if i in y:
        x += a
    if i == queries[0]:
        ewma.append(x)
        queries.pop(0)

完成后，ewma应该有每个查询点的移动平均值。

编辑包括SchighSchagh的改进。

查询长列表

3 个答案: