我目前有一个类似于:
的数据框(df)> j
policyNumber driverName vehicleName
1 1 jason blue
2 1 josh red
3 1 jason green
4 2 jason orange
5 2 kyle orange
6 3 chris pink
7 3 ally purple
我想分别添加依赖于policyNumber和driverName或vehicleName的不同驱动程序和车辆ID。我最大的问题是我无法获得一个能够识别出策略编号已经改变的功能(所以将序列重置为1),并且相同的条目可能不是连续的(例如,在策略编号中为jason' 1)。
我希望有一个数据框出现:
> j
policyNumber driverName vehicleName driverNumber vehicleNumber
1 1 jason blue 1 1
2 1 josh red 2 2
3 1 jason green 1 3
4 2 jason orange 1 1
5 2 kyle orange 2 1
6 3 chris pink 1 1
7 3 ally purple 2 2
答案 0 :(得分:4)
在Base R
dt$driverNumber = ave(dt$driverName,dt$policyNumber,FUN = function(x) as.numeric(as.factor(x)))
dt$vehicleNumber = ave(dt$vehicleName,dt$policyNumber,FUN = function(x) as.numeric(as.factor(x)))
dt
policyNumber driverName vehicleName driverNumber vehicleNumber
1 1 jason blue 1 1
2 1 josh red 2 3
3 1 jason green 1 2
4 2 jason orange 1 1
5 2 kyle orange 2 1
6 3 chris pink 2 1
7 3 ally purple 1 2
答案 1 :(得分:1)
与Wen相同,但dplyr
。我还指定了levels
来保持订单的顺序,而不是字母顺序。
library(dplyr)
j %>% group_by(policyNumber) %>%
mutate(driverNumber = as.numeric(factor(driverName, levels = unique(driverName))),
vehicleNumber = as.numeric(factor(vehicleName, levels = unique(vehicleName))))
# # A tibble: 7 x 5
# # Groups: policyNumber [3]
# policyNumber driverName vehicleName driverNumber vehicleNumber
# <int> <fctr> <fctr> <dbl> <dbl>
# 1 1 jason blue 1 1
# 2 1 josh red 2 2
# 3 1 jason green 1 3
# 4 2 jason orange 1 1
# 5 2 kyle orange 2 1
# 6 3 chris pink 1 1
# 7 3 ally purple 2 2