使用其他方法改进for循环

时间:2015-11-26 07:44:22

标签: r optimization data.table dplyr

问题

有一个主站(df)和3个本地站(s)堆叠在一个data.frame中,其值为三天。我们的想法是从主站获取每一天,找到三个本地站的相对异常,并使用phylin包中的反距离加权(IDW)对其进行平滑。然后通过乘法将其应用于主站中的value

有关改进此代码的任何建议(例如data.tabledplyrapply)?如果没有繁琐的for循环,我仍然不知道如何解决这个问题。

dput

s <- structure(list(id = c("USC00031152", "USC00034638", "USC00036352", 
"USC00031152", "USC00034638", "USC00036352", "USC00031152", "USC00034638", 
"USC00036352"), lat = c(33.59, 34.7392, 35.2833, 33.59, 34.7392, 
35.2833, 33.59, 34.7392, 35.2833), long = c(-92.8236, -90.7664, 
-93.1, -92.8236, -90.7664, -93.1, -92.8236, -90.7664, -93.1), 
    year = c(1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 
    1900), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1), day = c(1L, 
    1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), value = c(63.3157576809045, 
    86.0490598902219, 76.506386949066, 71.3760752788486, 89.9119576975542, 
    76.3535163951321, 53.7259645981243, 61.7989638892985, 85.8911224149051
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-9L), .Names = c("id", "lat", "long", "year", "month", "day", 
"value"))

df <- structure(list(id = c(12345, 12345, 12345), lat = c(100, 100, 
100), long = c(50, 50, 50), year = c(1900, 1900, 1900), month = c(1, 
1, 1), day = 1:3, value = c(54.8780020601509, 106.966029162171, 
98.3198828955801)), row.names = c(NA, -3L), class = "data.frame", .Names = c("id", 
"lat", "long", "year", "month", "day", "value"))

代码

library(phylin)

nearest <- function(i, loc){
  # Stack 3 local stations
  stack <- s[loc:(loc+2),]

  # Get 1 main station
  station <- df[i,]

  # Check for NA and build relative anomaly (r)
  stack <- stack[!is.na(stack$value),]
  stack$r <- stack$value/station$value

  # Use IDW and return v
  v <- as.numeric(ifelse(dim(stack)[1] == 1, 
                    stack$r, 
                    idw(stack$r, stack[,c(2,3,8)], station[,2:3])))
  return(v)
}  


ncdc <- 1

for (i in 1:nrow(df)){
  # Get relative anomaly from function
  r <- nearest(i, ncdc)

  # Get value from main station and apply anomaly
  p <- df[i,7]              
  df[i,7] <- p*r   

  # Iterate to next 3 local stations 
  ncdc <- ncdc + 3
}

1 个答案:

答案 0 :(得分:1)

假设你让最近的函数保持不变。 然后,您可以通过

获取df中的新值列
newvalue <- sapply(1:NROW(df), function (i) df[i,7] * nearest(i, 3*(i-1)+1))
df$value <- newvalue