R从多列添加不同的id列

时间:2017-10-27 18:40:01

标签: r dataframe

我目前有一个类似于:

的数据框(df)
> j
  policyNumber driverName vehicleName
1            1      jason        blue
2            1       josh         red
3            1      jason       green
4            2      jason      orange
5            2       kyle      orange
6            3      chris        pink
7            3       ally      purple

我想分别添加依赖于policyNumber和driverName或vehicleName的不同驱动程序和车辆ID。我最大的问题是我无法获得一个能够识别出策略编号已经改变的功能(所以将序列重置为1),并且相同的条目可能不是连续的(例如,在策略编号中为jason' 1)。

我希望有一个数据框出现:

> j
  policyNumber driverName vehicleName driverNumber vehicleNumber
1            1      jason        blue            1             1
2            1       josh         red            2             2
3            1      jason       green            1             3
4            2      jason      orange            1             1
5            2       kyle      orange            2             1
6            3      chris        pink            1             1
7            3       ally      purple            2             2

2 个答案:

答案 0 :(得分:4)

在Base R

dt$driverNumber = ave(dt$driverName,dt$policyNumber,FUN = function(x) as.numeric(as.factor(x)))
dt$vehicleNumber = ave(dt$vehicleName,dt$policyNumber,FUN = function(x) as.numeric(as.factor(x)))
dt
  policyNumber driverName vehicleName driverNumber vehicleNumber
1            1      jason        blue            1             1
2            1       josh         red            2             3
3            1      jason       green            1             2
4            2      jason      orange            1             1
5            2       kyle      orange            2             1
6            3      chris        pink            2             1
7            3       ally      purple            1             2

答案 1 :(得分:1)

与Wen相同,但dplyr。我还指定了levels来保持订单的顺序,而不是字母顺序。

library(dplyr)
j %>% group_by(policyNumber) %>%
  mutate(driverNumber = as.numeric(factor(driverName, levels = unique(driverName))), 
         vehicleNumber = as.numeric(factor(vehicleName, levels = unique(vehicleName))))

# # A tibble: 7 x 5
# # Groups:   policyNumber [3]
#   policyNumber driverName vehicleName driverNumber vehicleNumber
#          <int>     <fctr>      <fctr>        <dbl>         <dbl>
# 1            1      jason        blue            1             1
# 2            1       josh         red            2             2
# 3            1      jason       green            1             3
# 4            2      jason      orange            1             1
# 5            2       kyle      orange            2             1
# 6            3      chris        pink            1             1
# 7            3       ally      purple            2             2