我有一个包含超过百万行的大型数据集。以下是原始数据集的示例:
> dput(dg)
structure(list(Vehicle.ID = c(1324L, 1582L, 401L, 811L, 2429L,
2523L, 1033L, 2039L, 1662L, 1288L, 1742L, 74L, 2607L, 304L, 2476L,
127L, 2484L, 395L, 2793L, 1618L, 2395L, 270L, 2192L, 354L, 2234L,
766L, 447L, 2132L, 1848L, 532L, 2113L, 2905L, 1166L, 1452L, 2701L,
2144L, 2202L, 955L, 1500L, 2572L, 1234L, 2113L, 576L, 997L, 891L,
335L, 1156L, 2480L, 1980L, 2798L), Link.number = c(2L, 10000L,
2L, 3L, 1L, 3L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 1L,
1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 2L, 3L,
1L, 3L, 3L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 1L
), Lane = structure(c(4L, 1L, 1L, 2L, 5L, 1L, 4L, 2L, 4L, 1L,
3L, 4L, 1L, 5L, 2L, 2L, 5L, 2L, 3L, 4L, 2L, 4L, 1L, 2L, 1L, 2L,
2L, 5L, 5L, 5L, 1L, 4L, 6L, 3L, 1L, 3L, 2L, 3L, 5L, 1L, 1L, 2L,
4L, 2L, 2L, 4L, 4L, 2L, 3L, 3L), .Label = c("1", "2", "3", "4",
"5", "6"), class = "factor")), .Names = c("Vehicle.ID", "Link.number",
"Lane"), class = "data.frame", row.names = c(590084L, 734964L,
198436L, 375206L, 1124080L, 1235905L, 454260L, 1000231L, 736019L,
558048L, 831494L, 38723L, 1191084L, 121339L, 1169687L, 61487L,
1176256L, 150989L, 1292775L, 749442L, 1148838L, 124741L, 1037887L,
156697L, 1056299L, 325099L, 243937L, 1000026L, 881043L, 231402L,
991321L, 1349675L, 510814L, 691971L, 1255936L, 1038333L, 1055881L,
411274L, 671564L, 1225808L, 580627L, 1012699L, 238513L, 432055L,
413181L, 118829L, 514324L, 1212860L, 910530L, 1299547L))
dg
中的'Lane'变量是整数变量,它具有基于变量'Link.number'的以下唯一值:
> ddply(dg, .(Link.number), summarize, unique.lanes = list(unique(Lane)))
Link.number unique.lanes
1 1 5, 4, 1, 2, 3
2 2 4, 1, 2, 3, 5, 6
3 3 2, 1, 4, 5, 3
4 10000 1
请注意,每个'Lane.number'的'unique.lanes'与其他情况不同。例如,链路号1的唯一车道1与链路号10000的唯一车道1不同
我的目标是改变这些独特的价值观。所以,我将Lane
变量转换为因子,现在'dg'包含Lane作为因子。我想通过颠倒它们的顺序将这些级别更改为新值:
Link.number old Lane level new Lane level
1 1 5
1 2 4
1 3 3
1 4 2
1 5 1
2 1 6
2 2 5
2 3 4
2 4 3
2 5 2
2 6 1
3 1 5
3 2 4
3 3 3
3 4 2
3 5 1
10000 1 7
将Lane
转换为因子后,如果我查看其级别,我会得到:
> ddply(dg, .(Link.number), summarize, lev = list(levels(Lane)))
Link.number lev
1 1 1, 2, 3, 4, 5, 6
2 2 1, 2, 3, 4, 5, 6
3 3 1, 2, 3, 4, 5, 6
4 10000 1, 2, 3, 4, 5, 6
Link.number 10000只有一个唯一值,即1,但它给出的级别为1:6。如何更改以Link.number为条件的级别?