Question

我试图在指定范围内的两个数据列表之间找到最佳相关性（即最高r平方值）（即找到与其对应的'y'值具有最佳相关性的'x'值的范围）。基本上我在寻找数据中的线性范围。这就是我到目前为止所做的：

 #Example data - actually have a much more complicated data set
    x <- c(1,2,3,4,5,6,7,8,9)
    y <- c(0.25,1.5,3,4,5,6,6.5,7,7.5)
    data.range <- 0 #create a new variable which will contain the 
    r.sq <- 0
    for (i in 1:length(x)) {
      r.sq[i] <- round(cor(x[i:(i+5)], y[i:(i+5)],4)
      data.range[i] <- paste(x[i], x[i+5], sep = " - ")
      output <- data.frame(na.omit(cbind(data.range, r.sq)))
    }
#Example read out
head(output)
  data.range    r.sq
  1 - 6         0.9963
  2 - 7         0.9906
  3 - 8         0.9885
  4 - 9         0.9839

这里，我有输出设置给我一个数据帧，其中包含与关联的'y'相关的'x'数据范围，以及与该'x'数据范围相对应的cor（）值。现在，我使用5个点（因此i + 5）预测'x'和'y'之间的相关性，但最后我不想定义“5”，因为线性范围可能跨越6或8分。所以我想做'x'和'y'的所有可能的相关性，并得到一个数据范围列表（data.range）和相应的cor（）值（r.sq）。

data.range     r.sq        
1 - 4          0.9999
1 - 5          0.9808
1 - 6          0.9805
1 - 7          etc...
1 - 8
1 - 9
2 - 5
2 - 6
2 - 7
2 - 8
etc....

欢迎任何建议！

Answer 1

不确定。您的i循环从1到length(x)。所以：

for (i in 1:length(x)) {
    for (j in desired_start:desired_finish) {
        r.sq[i] <- cor(x[i:j], y[i:j], n)

你得到余下的。有更多的方法可以做到这一点，但如果你是新的，这是一个非常好的开始，你似乎很好地掌握了循环。这将首先遍历i，并为每个j

捕获i的每个可能值

找到两个数据向量之间的最佳相关性

1 个答案: