Question

我有时间序列数组，每个数组平均大约1000个值。我需要在每个数组中独立识别时间序列段。

我找不到有关如何实现这一目标的标准信息。我目前使用的方法是每当每个项目之间的经过时间超过它时计算数组和分段项目的平均值。我确信有更合适的方法。

这是我目前正在使用的代码。

def time_cluster(input)
  input.sort!
  differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
  mean = differences.mean

  clusters = []
  j = 0

  input.each_index do |i|
    j += 1 if i > 0 and differences[i-1] > mean
    (clusters[j] ||= []) << input[i]
  end

  return clusters
end

此代码中的几个样本

time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])

输出

1  2  3  4  7  9, sparsity 1.3
250  254  258  270  292,  sparsity 8.4
340  345  349  371  375  382  405  407  409, sparsity 7
520  527, sparsity 3

另一个数组

time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])

输出

1  2  3  4  5  6  7  8  9  10, sparsity 0.9
1000  1020  1040  1060  1080, sparsity 16
1200

Answer 1

使用K-Means。 http://ai4r.rubyforge.org/machineLearning.html

gem install ai4r

奇异值分解也可能让您感兴趣。 http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/

如果你不能在Ruby中做到这一点，这是Python中的一个很好的例子。

Unsupervised clustering with unknown number of clusters

Answer 2

您可以尝试群集算法（例如k-means）。

一些链接：

时间序列分割

2 个答案: