r编程为每个值多次子集数据帧一个向量和一个数据帧列

时间:2015-09-10 20:32:48

标签: r

我有一个值为1:6的矢量,一个15分钟的数据帧和一个扫描数据的数据帧。数据框如下所示。

idMin5Bin            BinStart              BinEnd
22        22 2015-08-13 10:15:00 2015-08-13 10:19:59
23        23 2015-08-13 10:20:00 2015-08-13 10:24:59
24        24 2015-08-13 10:25:00 2015-08-13 10:29:59
25        25 2015-08-13 10:30:00 2015-08-13 10:34:59
26        26 2015-08-13 10:35:00 2015-08-13 10:39:59
27        27 2015-08-13 10:40:00 2015-08-13 10:44:59

汽车

  idTrip Link_IDLink StartCluster_id   Speed           firstScan
10     10           5              19  47.961 2015-08-13 10:11:49
11     11           5              14 118.800 2015-08-13 10:12:33
12     11           5              14 118.800 2015-08-13 10:13:16
13     12           5              22  47.793 2015-08-13 10:11:21
15     14           5              28  56.321 2015-08-13 10:13:09
24     22           5              52  45.692 2015-08-13 10:14:50

对于向量中的每个值,我想引用cars表来查找具有与向量值匹配的LinkIDLink值的所有汽车。

然后,我希望通过将汽车FirstScan与投放箱表BinStartBinEnd表进行比较来对所有匹配进行分组。

最后,我想绘制子集中的值。

我能想到的唯一策略是使用嵌套循环(我知道这是禁止的)。即使我的嵌套循环,我从下面的示例代码中得到以下错误。

for (i in 1:length(vector)){
  tempcars<-cars[cars[,2]==i,]
  for (k in 1:nrow(bins)){
    tempcars1<-subset(tempcars, firstScan<bins[k,3] & firstScan>bins[k,2])
    hist(tempcars1[,5], breaks =200)
}
}

    Error in hist.default(unclass(x), unclass(breaks), plot = FALSE, warn.unused = FALSE,  : 
  character(0) In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

我当然希望摆脱使用循环,但是对循环的任何帮助都表示赞赏。

1 个答案:

答案 0 :(得分:0)

这是对此开始的答案......希望它有所帮助......

# Generate the data
theVec <- 1:6
someTimes <- seq(as.POSIXlt(Sys.time()), by = "sec", length = 300)
bins <- data.frame(idMin5Bin = 1:20, BinStart = someTimes[1+(15*(0:19))], BinEnd = someTimes[(15*(1:20))])
cars <- data.frame(Link_IDLink = rep(theVec, each = 20), 
  firstScan = sample(someTimes, 120, replace = T), Speed = runif(120, 30, 100))


# First split by Link_IDLink
subCars <- subset(cars, Link_IDLink %in% theVec)
carList <- split(subCars, subCars$Link_IDLink)

# Now "cut" the times for each element of the list
outList <- lapply(carList, function(df, binData) {
  theBins <- c(binData$BinStart, binData$BinEnd [ nrow(binData)] )
  df$idMin5Bin <- cut(df$firstScan, theBins, labels = binData$idMin5Bin )
  df
}, binData = bins)

结束这个......

> head(outList[[1]])
  Link_IDLink           firstScan    Speed isMin5Bin
1           1 2015-09-10 22:42:33 33.85446        17
2           1 2015-09-10 22:41:06 81.43807        11
3           1 2015-09-10 22:40:53 90.59927        10
4           1 2015-09-10 22:39:38 56.89429         5
5           1 2015-09-10 22:40:20 70.44760         8
6           1 2015-09-10 22:42:08 88.93505        15

您可以通过多种方式进行绘制 - 如果您需要帮助,请告诉我。