通过匹配名称来日常滚动相关性

时间:2016-06-01 20:48:38

标签: r dplyr window-functions

我们说我有下面的数据框。 (我拥有的数据集不一定像这一样小。)

library(lubridate)

x <- data.frame(
  date = c(rep(ymd(20160601), 4), rep(ymd(20160602), 3), rep(ymd(20160603), 3)),
  name = c("a", "b", "c", "d", "a", "b", "c", "b", "c", "d"),
  observation = sample(1:10)
)

#          date name observation
# 1  2016-06-01    a          10
# 2  2016-06-01    b           7
# 3  2016-06-01    c           3
# 4  2016-06-01    d           2
# 5  2016-06-02    a           8
# 6  2016-06-02    b           6
# 7  2016-06-02    c           4
# 8  2016-06-03    b           5
# 9  2016-06-03    c           1
# 10 2016-06-03    d           9

我想找到匹配名称的观察的日常相关性,即,对于2016-06-02的日期,我想找到&lt; 8,6,4&gt;之间的相关性。和&lt; 10,7,3&gt;因为在2016-06-02和2016-06-01中只有a,b和c是常见的。我可以这样做(可能有更好的方法):

filter(x, date %in% ymd(20160601)) %>%
  left_join(filter(x, date %in% ymd(20160602)), by = "name") %>%
  transmute(
    date = ymd(20160602),
    correlation = cor(observation.x, observation.y, use = "complete.obs")) %>%
  `[`(1, )

#         date correlation
# 1 2016-06-02   0.9966159

但是如何使用窗口函数对整个数据框执行此操作,以便获得包含所有日期及其与上一个日期的相关性的数据框?我更喜欢dplyr / RcppRoll解决方案!

1 个答案:

答案 0 :(得分:3)

dplyr没有滚动合并。假设你确实需要一个(不清楚OP,因为样本数据没有漏洞),你可以这样做:

library(data.table)
dt = as.data.table(x) # or setDT to convert in place

dt[, date := as.Date(date)] # not very clear from OP if you have dates or datetimes
                            # let's make sure it's dates

dt[.(name = name, old.date = date - 1, obs = observation),
     on = c(name = 'name', date = 'old.date'), roll = T][
   , cor(obs, observation, use = 'pairwise.complete.obs'), by = date]
#         date         V1
#1: 2016-06-01         NA
#2: 2016-06-02  0.9966159
#3: 2016-06-03 -0.5000000