如何将包含for循环的函数应用于具有多个组的数据框?

时间:2016-07-20 19:11:18

标签: r purrr

数据

嗨,这是this one的后续问题。基本上,我有以下数据框:

> dput(foo)
structure(list(Local.Y = c(50.71994, 60.37412, 69.99005, 78.60745
), Un = c(9.48762, 9.93521, 8.9674, 8.33772), PrecVehLocalY = c(70.19624, 
78.50749, 86.49717, 93.4731), Ln = c(3.9019, 3.9019, 3.9019, 
3.9019), sn_minus_Ln = c(15.5744, 14.23147, 12.60522, 10.96375
), Vehicle.ID2 = c("1-2", "1-2", "1-2", "1-2")), .Names = c("Local.Y", 
"Un", "PrecVehLocalY", "Ln", "sn_minus_Ln", "Vehicle.ID2"), row.names = c(NA, 
4L), class = "data.frame")

这些是示例数据。原始数据框具有数百个变量Vehicle.ID2的唯一值。它代表车辆对,即#34;主题车辆#-front vehicle#"。

我想做什么

我想使用现有变量估计5个新变量。由于新变量既依赖于现有变量又依赖于它们本身,因此使用了for-loop。示例数据(foo)仅包含1个唯一Vehicle.ID2。如果以下for-loop直接应用于foo,则会显示预期结果:

# for loop
for( i in ( seq_len( nrow(foo)-1 ) + 1 ) ) {
  if( i <= 2L ) {
    foo$Un_dt_1[i] <- foo$Un[i-1] * 3.6 + 
      3.6 * ( 1.765 + ( 1.765 - 1.04 ) * 
                foo$Un[1] * 3.6 / 80 ) * 1
    foo$Un_dt_2[i] <- 3.6 * ( foo$sn_minus_Ln[i-1] - 4.4 ) / 1
  } else {
    foo$Un_dt_1[i] <- foo$Un_dt[i-1] + 
      3.6 * ( 1.765 + ( 1.765 - 1.04 ) * 
                foo$Un_dt[i-1] / 80 ) * 1
    foo$Un_dt_2[i] <- 3.6 * ( foo$pred_sn_minus_Ln[i-1] - 4.4 ) / 1
  }
  foo$Un_dt[i] <- pmin( foo$Un_dt_1[i], foo$Un_dt_2[i] )
  if( i <= 2 ) {
    foo$pred_Local.Y[i] <- foo$Local.Y[i-1] + 
      0.5 * ( ( foo$Un_dt[i] + foo$Un[i-1] ) / 3.6 ) * 1
  } else {
    foo$pred_Local.Y[i] <- foo$pred_Local.Y[i-1] + 
      0.5 * ( ( foo$Un_dt[i] + foo$Un_dt[i-1] ) / 3.6 ) * 1
  }

  foo$pred_sn_minus_Ln[i] <- foo$PrecVehLocalY[i] - foo$pred_Local.Y[i] - foo$Ln[i]
}

# results
structure(list(Local.Y = c(50.71994, 60.37412, 69.99005, 78.60745
), Un = c(9.48762, 9.93521, 8.9674, 8.33772), PrecVehLocalY = c(70.19624, 
78.50749, 86.49717, 93.4731), Ln = c(3.9019, 3.9019, 3.9019, 
3.9019), sn_minus_Ln = c(15.5744, 14.23147, 12.60522, 10.96375
), Vehicle.ID2 = c("1-2", "1-2", "1-2", "1-2"), Un_dt_1 = c(NA, 
41.623752969, 47.89427328, 53.12221615125), Un_dt_2 = c(NA, 40.22784, 
45.29061, 31.294233), Un_dt = c(NA, 40.22784, 45.29061, 31.294233
), pred_Local.Y = c(NA, 57.624865, 69.5024275, 80.13921125), 
    pred_sn_minus_Ln = c(NA, 16.980725, 13.0928425, 9.43198875
    )), .Names = c("Local.Y", "Un", "PrecVehLocalY", "Ln", "sn_minus_Ln", 
"Vehicle.ID2", "Un_dt_1", "Un_dt_2", "Un_dt", "pred_Local.Y", 
"pred_sn_minus_Ln"), row.names = c(NA, 4L), class = "data.frame")

由于原始数据框有很多Vehicle.ID2 s,我想在函数内部使用此for-loop,然后将该函数应用于由Vehicle.ID2分割的所有数据组。

我尝试了什么和问题:

我尝试使用purrr包。

功能

f1 <- function(df){
for( i in ( seq_len( nrow(df)-1 ) + 1 ) ) {
  if( i <= 2L ) {
    df$Un_dt_1[i] <- df$Un[i-1] * 3.6 + 
      3.6 * ( 1.765 + ( 1.765 - 1.04 ) * 
                df$Un[1] * 3.6 / 80 ) * 1
    df$Un_dt_2[i] <- 3.6 * ( df$sn_minus_Ln[i-1] - 4.4 ) / 1
  } else {
    df$Un_dt_1[i] <- df$Un_dt[i-1] + 
      3.6 * ( 1.765 + ( 1.765 - 1.04 ) * 
                df$Un_dt[i-1] / 80 ) * 1
    df$Un_dt_2[i] <- 3.6 * ( df$pred_sn_minus_Ln[i-1] - 4.4 ) / 1
  }
  df$Un_dt[i] <- pmin( df$Un_dt_1[i], df$Un_dt_2[i] )
  if( i <= 2 ) {
    df$pred_Local.Y[i] <- df$Local.Y[i-1] + 
      0.5 * ( ( df$Un_dt[i] + df$Un[i-1] ) / 3.6 ) * 1
  } else {
    df$pred_Local.Y[i] <- df$pred_Local.Y[i-1] + 
      0.5 * ( ( df$Un_dt[i] + df$Un_dt[i-1] ) / 3.6 ) * 1
  }

  df$pred_sn_minus_Ln[i] <- df$PrecVehLocalY[i] - df$pred_Local.Y[i] - df$Ln[i]
}
}

将函数应用于数据:

library(purrr)
foor <- split(foo, foo$Vehicle.ID2)
> map(foor, f1)
$`1-2`
NULL

为什么导致NULL?这是完全相同的功能。

0 个答案:

没有答案