如何根据R中的另一个变量来改变因子变量的水平?

时间:2014-09-12 23:27:44

标签: r

数据

我有一个包含超过百万行的大型数据集。以下是原始数据集的示例:

> dput(dg)
structure(list(Vehicle.ID = c(1324L, 1582L, 401L, 811L, 2429L, 
2523L, 1033L, 2039L, 1662L, 1288L, 1742L, 74L, 2607L, 304L, 2476L, 
127L, 2484L, 395L, 2793L, 1618L, 2395L, 270L, 2192L, 354L, 2234L, 
766L, 447L, 2132L, 1848L, 532L, 2113L, 2905L, 1166L, 1452L, 2701L, 
2144L, 2202L, 955L, 1500L, 2572L, 1234L, 2113L, 576L, 997L, 891L, 
335L, 1156L, 2480L, 1980L, 2798L), Link.number = c(2L, 10000L, 
2L, 3L, 1L, 3L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 1L, 
1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 2L, 3L, 
1L, 3L, 3L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 1L
), Lane = structure(c(4L, 1L, 1L, 2L, 5L, 1L, 4L, 2L, 4L, 1L, 
3L, 4L, 1L, 5L, 2L, 2L, 5L, 2L, 3L, 4L, 2L, 4L, 1L, 2L, 1L, 2L, 
2L, 5L, 5L, 5L, 1L, 4L, 6L, 3L, 1L, 3L, 2L, 3L, 5L, 1L, 1L, 2L, 
4L, 2L, 2L, 4L, 4L, 2L, 3L, 3L), .Label = c("1", "2", "3", "4", 
"5", "6"), class = "factor")), .Names = c("Vehicle.ID", "Link.number", 
"Lane"), class = "data.frame", row.names = c(590084L, 734964L, 
198436L, 375206L, 1124080L, 1235905L, 454260L, 1000231L, 736019L, 
558048L, 831494L, 38723L, 1191084L, 121339L, 1169687L, 61487L, 
1176256L, 150989L, 1292775L, 749442L, 1148838L, 124741L, 1037887L, 
156697L, 1056299L, 325099L, 243937L, 1000026L, 881043L, 231402L, 
991321L, 1349675L, 510814L, 691971L, 1255936L, 1038333L, 1055881L, 
411274L, 671564L, 1225808L, 580627L, 1012699L, 238513L, 432055L, 
413181L, 118829L, 514324L, 1212860L, 910530L, 1299547L))

目标

dg中的'Lane'变量是整数变量,它具有基于变量'Link.number'的以下唯一值:

> ddply(dg, .(Link.number), summarize, unique.lanes = list(unique(Lane)))
  Link.number     unique.lanes
1           1    5, 4, 1, 2, 3
2           2 4, 1, 2, 3, 5, 6
3           3    2, 1, 4, 5, 3
4       10000                1

请注意,每个'Lane.number'的'unique.lanes'与其他情况不同。例如,链路号1的唯一车道1与链路号10000的唯一车道1不同 我的目标是改变这些独特的价值观。所以,我将Lane变量转换为因子,现在'dg'包含Lane作为因子。我想通过颠倒它们的顺序将这些级别更改为新值:

    Link.number     old Lane level   new Lane level
           1            1               5
           1            2               4
           1            3               3
           1            4               2
           1            5               1

           2            1               6
           2            2               5
           2            3               4
           2            4               3
           2            5               2
           2            6               1

           3            1               5
           3            2               4
           3            3               3
           3            4               2
           3            5               1   

       10000            1               7  

问题

Lane转换为因子后,如果我查看其级别,我会得到:

> ddply(dg, .(Link.number), summarize, lev = list(levels(Lane)))
  Link.number              lev
1           1 1, 2, 3, 4, 5, 6
2           2 1, 2, 3, 4, 5, 6
3           3 1, 2, 3, 4, 5, 6
4       10000 1, 2, 3, 4, 5, 6

Link.number 10000只有一个唯一值,即1,但它给出的级别为1:6。如何更改以Link.number为条件的级别?

0 个答案:

没有答案