我们说我有下面的数据框。 (我拥有的数据集不一定像这一样小。)
library(lubridate)
x <- data.frame(
date = c(rep(ymd(20160601), 4), rep(ymd(20160602), 3), rep(ymd(20160603), 3)),
name = c("a", "b", "c", "d", "a", "b", "c", "b", "c", "d"),
observation = sample(1:10)
)
# date name observation
# 1 2016-06-01 a 10
# 2 2016-06-01 b 7
# 3 2016-06-01 c 3
# 4 2016-06-01 d 2
# 5 2016-06-02 a 8
# 6 2016-06-02 b 6
# 7 2016-06-02 c 4
# 8 2016-06-03 b 5
# 9 2016-06-03 c 1
# 10 2016-06-03 d 9
我想找到匹配名称的观察的日常相关性,即,对于2016-06-02的日期,我想找到&lt; 8,6,4&gt;之间的相关性。和&lt; 10,7,3&gt;因为在2016-06-02和2016-06-01中只有a,b和c是常见的。我可以这样做(可能有更好的方法):
filter(x, date %in% ymd(20160601)) %>%
left_join(filter(x, date %in% ymd(20160602)), by = "name") %>%
transmute(
date = ymd(20160602),
correlation = cor(observation.x, observation.y, use = "complete.obs")) %>%
`[`(1, )
# date correlation
# 1 2016-06-02 0.9966159
但是如何使用窗口函数对整个数据框执行此操作,以便获得包含所有日期及其与上一个日期的相关性的数据框?我更喜欢dplyr / RcppRoll解决方案!
答案 0 :(得分:3)
dplyr
没有滚动合并。假设你确实需要一个(不清楚OP,因为样本数据没有漏洞),你可以这样做:
library(data.table)
dt = as.data.table(x) # or setDT to convert in place
dt[, date := as.Date(date)] # not very clear from OP if you have dates or datetimes
# let's make sure it's dates
dt[.(name = name, old.date = date - 1, obs = observation),
on = c(name = 'name', date = 'old.date'), roll = T][
, cor(obs, observation, use = 'pairwise.complete.obs'), by = date]
# date V1
#1: 2016-06-01 NA
#2: 2016-06-02 0.9966159
#3: 2016-06-03 -0.5000000