我有多个车辆数据集,每个都有唯一的ID Vehicle.ID2
。以下是仅1辆车的数据的一部分:
df <- structure(list(Vehicle.ID2 = c("4-2", "4-2", "4-2", "4-2", "4-2",
"4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2", "4-2",
"4-2", "4-2", "4-2", "4-2", "4-2", "4-2"), Time = c(3, 3.2, 3.4,
3.6, 3.8, 4, 4.2, 4.4, 4.6, 4.8, 5, 5.2, 5.4, 5.6, 5.8, 6, 6.2,
6.4, 6.6, 6.8), yposition = c(3.451, 7.357, 11.264, 15.171, 19.077,
22.984, 26.89, 30.797, 34.704, 38.61, 42.517, 46.423, 50.33,
54.236, 58.143, 62.05, 65.956, 69.863, 73.769, 77.676), LeadVehyposition2 = c(55.043,
NA, 64.098, 68.626, 73.153, 77.681, 82.209, 86.736, 91.264, 95.791,
100.319, 104.847, 109.374, 113.902, 118.429, 122.957, 127.485,
132.012, 136.54, 141.067)), .Names = c("Vehicle.ID2", "Time",
"yposition", "LeadVehyposition2"), class = c("tbl_df", "data.frame"
), row.names = c(NA, -20L))
我想将LeadVehyposition2
与yposition
中的df
进行比较,并输出Time
大于或等于{{yposition
的{{1}} 1}}。对于1辆车,我可以使用以下代码为LeadVehyposition2
中的第1个值:
LeadVehyposition2
此处,df$Time[head(which(df$yposition>=55.043),1)]
> 5.8
中的第一个值为55.043,我将其与LeadVehyposition2
中的所有值进行了比较。我想对yposition
中的所有值执行相同操作。以下是不适用于整个数据集的代码(多个车辆ID):
LeadVehyposition2
问题是使用第二段代码仅按行比较library(dplyr)
mydata %>%
group_by(Vehicle.ID2) %>%
mutate(Time.PET = Time[head(which(yposition>=LeadVehyposition2),1)]%>%
ungroup()
和yposition
的值。但是,目标是保持LeadVehyposition2
不变,并将其与LeadVehyposition2
的整列进行比较。我怎么解决这个问题?
答案 0 :(得分:3)
这是在base
;
df$Time[sapply(df$LeadVehyposition2, function(p) min(which(df$yposition >= p)))]
[1] 5.8 NA 6.2 6.4 6.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
或者:
with(df, Time[sapply(LeadVehyposition2, function(p) min(which(yposition >= p)))])
[1] 5.8 NA 6.2 6.4 6.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
按照车辆问题处理分组:
df <- df[order(df$Vehicle.ID2, df$Time), ]
do.call(c, sapply(split(df, df$Vehicle.ID2), function(df)
with(df, Time[sapply(LeadVehyposition2, function(p) min(which(yposition >= p)))])))
答案 1 :(得分:3)
data.table
方法可以将df
加入自身,然后采用Time
和yposition
之间的正差异的最小LeadVehyposition2
。
library(data.table)
setDT(df)
res <- df[ df[, .(Vehicle.ID2, Time, yposition)], on = c("Vehicle.ID2"), allow.cartesian=T][i.yposition - LeadVehyposition2 > 0, .(min(i.Time)), by = .(Vehicle.ID2, Time, LeadVehyposition2)]
res
# Vehicle.ID2 Time LeadVehyposition2 V1
# 1: 4-2 3.0 55.043 5.8
# 2: 4-2 3.4 64.098 6.2
# 3: 4-2 3.6 68.626 6.4
# 4: 4-2 3.8 73.153 6.6
将此内容加入df
会将额外的列添加到原始数据
res[df, on = c("Vehicle.ID2","Time","LeadVehyposition2")]
# Vehicle.ID2 Time LeadVehyposition2 V1 yposition
# 1: 4-2 3.0 55.043 5.8 3.451
# 2: 4-2 3.2 NA NA 7.357
# 3: 4-2 3.4 64.098 6.2 11.264
# 4: 4-2 3.6 68.626 6.4 15.171
# 5: 4-2 3.8 73.153 6.6 19.077
# 6: 4-2 4.0 77.681 NA 22.984
# ...
# 17: 4-2 6.2 127.485 NA 65.956
# 18: 4-2 6.4 132.012 NA 69.863
# 19: 4-2 6.6 136.540 NA 73.769
# 20: 4-2 6.8 141.067 NA 77.676
答案 2 :(得分:2)
您可以使用滚动连接:
library(data.table)
setDT(df)
# create an index to be used for matching
df[, idx := 1:.N, by = Vehicle.ID2]
# find the matching index using rolling joins
df[, idx.m := .SD[.SD, on = c('Vehicle.ID2', yposition = 'LeadVehyposition2'), roll = T,
idx + 1]][1:5]
# Vehicle.ID2 Time yposition LeadVehyposition2 idx idx.m
#1: 4-2 3.0 3.451 55.043 1 15
#2: 4-2 3.2 7.357 NA 2 NA
#3: 4-2 3.4 11.264 64.098 3 17
#4: 4-2 3.6 15.171 68.626 4 18
#5: 4-2 3.8 19.077 73.153 5 19
# get the time for each match
df[, Time.PET := Time[idx.m], by = Vehicle.ID2][1:5]
# Vehicle.ID2 Time yposition LeadVehyposition2 idx idx.m Time.PET
#1: 4-2 3.0 3.451 55.043 1 15 5.8
#2: 4-2 3.2 7.357 NA 2 NA NA
#3: 4-2 3.4 11.264 64.098 3 17 6.2
#4: 4-2 3.6 15.171 68.626 4 18 6.4
#5: 4-2 3.8 19.077 73.153 5 19 6.6
如果yposition
和LeadVehyposition2
严格相等,我建议为yposition
添加非常小的(正)抖动,以使上述方法正常工作。
添加非等联接的data.table
latest development version的另一个选项可以是:
library(data.table)
setDT(df)
df[df, on = .(Vehicle.ID2, yposition >= LeadVehyposition2), Time[1], by = .EACHI][1:5]
# Vehicle.ID2 yposition V1
#1: 4-2 55.043 5.8
#2: 4-2 NA NA
#3: 4-2 64.098 6.2
#4: 4-2 68.626 6.4
#5: 4-2 73.153 6.6
其中的内容是 - 在df
相同且Vehicle.ID2
大于或等于yposition
的列上自行加入LeadVehyposition2
,然后取第一个{{1}每个“i”(又名Time
的第一个参数)。
您当然可以将其指定为列:
[.data.table
注意:两个答案均假设df[, Time.PET := .SD[.SD, on = .(Vehicle.ID2, yposition >= LeadVehyposition2),
Time[1], by = .EACHI]$V1]
已按升序排序。