如何在R中以整洁的方式遍历列?

时间:2019-01-22 10:10:00

标签: r tidyr

我有两个带有时间序列数据的数据帧。每列代表一个不同的时间序列。

set.seed(123)

wiki_1 <- data.frame(
W = sample(1:1000,10,replace = T),
X = sample(1:100,10,replace = T),
Y = sample(1:10,10,replace = T),
Z = sample(1:10,10, replace = T)
)

wiki_2 <- data.frame(
A = sample(500:1000,100,replace = T),
B = sample(90:100,100,replace = T),
C = sample(1:10,100,replace = T),
D = sample(1:10,100,replace = T)
)

我想将wiki_1的每一列与wiki_2的样本进行比较,我可以使用以下函数来做到这一点

library(dplyr)
library(CausalImpact)
library(tidyverse)

time = as.Date('2009-01-01') + 0:9

combineMedianComp <- function(data1, data2, col, n){
if(nrow(data1) > nrow(data2)) stop("Rows in 'data2' need to be greater or equal to rows in 'data1'")

medRef <- median(data1[[col]], na.rm = T, ) # median of desired column

medComp <- sapply(data2, function(x){abs(medRef - median(x, na.rm = T))}) # vector with medians for each columns in data2 ('wiki_2')

cols <- names(sort(medComp)[seq_len(n)]) # sort this vector in ascending order, select top n

d2 <- data2[, c(cols)] # select columns in data2 that have medians closest to 'medRef'

d2 <- d2[sample(seq_len(nrow(d2)), size = nrow(data1), replace = F), ] # subset column as to match those in data1

# merge data
res <- do.call(cbind, list(data1[col], d2)) 

return(res)
}


median_data <- combineMedianComp(data1 = wiki_1, data2 = wiki_2, col = "W", n = 2)
median_data

df <- zoo(cbind(median_data),time)
df

impact <- CausalImpact(df, time[c(1,3)], time[c(7,9)],model.args = list(nseasons = 7, season.duration = 1))

但是我想以一种简洁的方式使事情自动化,而不是每次运行此行代码median_data <- combineMedianComp(data1 = wiki_1, data2 = wiki_2, col = "W", n = 2)时都必须输入列的名称(在这种情况下为“ W”),所以在此示例中我将在CausalImpact的末尾保存4个摘要,其中W,X,Y和Z分别保存一个。

希望可以为您提供帮助

0 个答案:

没有答案