Question

我编写的函数从XML文档中提取时间戳。时间戳与事件耦合，事件是系列元素的重复元素。

系列元素具有可变数量的事件，因此我的函数返回data.frame（如果系列具有相同的长度）。一般来说，它返回一个更通用的列表，我希望它也适用于矩阵。有人指出（感谢Eduardo）'list'是泛型类型，但是我仍然遇到处理通用列表的函数的问题，但是没有更具体的类型，比如data.frame或matrix。

目前我需要处理的数据是查看时间戳之间最常见的距离（我希望它出现的频率超过50％），我写了并重写了这样做的功能：

R> mostCommonStep( list(a=cumsum(c(1,3,3,2,3,3,4,3,2,3,3)), b=cumsum(c(2,3,2,3))) )
[1] 3
R> mostCommonStep( data.frame(a=c(2,4,6,8,12,14,18), b=c(12,14,16,18,22,24,28)) )
[1] 2
R> mostCommonStep( matrix(c(2,4,6,8,12,14,18, 12,14,16,18,22,24,28), 7, 2) )
[1] 2

但我希望看到更符合“R”标准的版本

Answer 1

数据框是列表。假设时间戳之间的距离在list / data.frame y中的向量“x”中。你可以sort(-table(y[["x"]]))[1]来获得模式。

Answer 2

解决此问题的最佳方法可能是使用不规则的时间序列对象（请参阅the time series view on CRAN）。你有几个很好的选择（例如timeSeries，它，fts，xts），但最受欢迎的是the zoo package。您可以创建如下的时间序列：

library(zoo)
x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 9, 14) - 1
x <- zoo(rnorm(5), x.Date)

然后，要查看每个事件之间的时间差异，您可以使用diff函数创建difftime对象：

> diff(index(x))
Time differences in days
[1] 2 4 2 5

您可以像对待任何其他变量一样分析这些时差，例如：

> summary(diff(index(x)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    2.00    3.00    3.25    4.25    5.00

同样，要找到最常见的时差，您可以使用任何其他标准方法，例如table()：

> table(diff(index(x)))
2 4 5 
2 1 1

Answer 3

我想我会解决这个问题（如果最常见的步骤真的发生的次数比50％的情况更多）。

mostCommonStep <- function(L) {
  ## returns the value of the most common difference between
  ## subsequent elements.

  ## takes into account only forward steps, all negative steps are
  ## discarded.  works with list, data.frame, matrix.
  L <- diff(unlist(sapply(as.list(L), as.numeric)))
  as.numeric(quantile(L[L>0], 0.5))
}

使用不同长度的序列

3 个答案: