在将列表转换为数据框或数据表时处理丢失的信息

时间:2015-07-08 00:45:01

标签: r data.table

previous question相关是否有任何方法可以将名称元素列表转换为数据表,其中NA值实际按照它们在列表中出现的顺序显示在数据表中?

例如:列表

testlist <- list("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")
names(testlist) <- testnames

$Color
[1] "Blue"

$HP
[1] "405"

$Type
[1] "Truck"

$HP
[1] "400"

$Type
[1] "Car"

$Color
[1] "White"

$HP
[1] "500"

$Type
[1] "Truck"

可以使用以下方法更改为数据表:

dcast(setDT(melt(testlist))[, N:=1:.N, L1], N~L1, value.var='value')

但输出是这样的:

  N Color  HP  Type
1 1  Blue 405 Truck
2 2 White 400   Car
3 3  <NA> 500 Truck

我想要的时候:

  N Color  HP  Type
1 1  Blue 405 Truck
2 2  <NA> 400   Car
3 3 White 500 Truck

有没有人建议如何解决这个问题?我很感激帮助。

3 个答案:

答案 0 :(得分:9)

一种方法是预先分配一个具有正确行数和正确数量,名称和列类型的表,然后通过索引分配原始列表所覆盖的单元格。

cns <- c('Color','HP','Type');
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(diff(lcis)<=0L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
df;
##   Color   HP Type
## 1  <NA> <NA> <NA>
## 2  <NA> <NA> <NA>
## 3  <NA> <NA> <NA>
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
##   Color  HP  Type
## 1  Blue 405 Truck
## 2  <NA> 400   Car
## 3 White 500 Truck

在我的解决方案中,我小心地分别处理每一列,这提供了潜在的好处,如果输出表中的不同列(对应于输入列表中不同的组件子集)具有不同的数据类型,那么这些数据类型将保留在决赛桌上。这就是我为索引分配选择for循环的原因。这当然不是你的确切输入列表所必需的,它只有字符类型,但我认为无论如何这都是一个有价值的目标。

中间变量的说明

  • cns输出表中的列名。
  • lcis每个输入列表组件的列索引将在输出表中。这是通过简单地将输入列表组件的名称与cns匹配来计算的。
  • lris每个输入列表组件在输出表中将具有的行索引。这个变量的计算有点令人感兴趣并且是解决方案的核心。由于输入列表中的列表示不完整(IOW输入列表中可能存在“缺少列”),但是您认为输入列表组件是按照它们在输出表中的行方式排序,我们可以'使用常规索引(例如将每三个组件作为一行),我们也不能使用任何单个列名作为每行的标记,因为任何行中都可能缺少任何列。根据我的想法,唯一正确的方法是确定何时在输入列表中的较高索引(或等索引)列之后立即出现较低索引(或等索引,实际)列,并将其作为换行符。因此,我们可以使用diff(lcis)<=0L来获取表示行中断的逻辑向量,使用cumsum()并添加1来获取行索引,我们还必须手动前置1来完成向量。
  • ci输出表中的列索引。在for循环期间用于迭代每个输出列。
  • mci循环中的每个for计算。表示哪个输入列表组件属于当前列ci的逻辑向量。用于索引lris(以提取要分配的行索引)和输入列表本身(以提取要分配的实际值)。

实际数据

我从dropbox抓取了您的真实数据并将其存储为testlist。以下是我的调查结果。

首先,我按照它们出现的顺序检查了唯一的组件名称,将它们视为cns

## first reasonable assumption about cns
cns <- unique(names(testlist));
cns;
##  [1] "Status"              "Make"                "Model"
##  [4] "Kilometres"          "Stock Number"        "Engine"
##  [7] "Number of Hours"     "Front axle"          "Rear axle"
## [10] "Suspension"          "Wheelbase"           "Transmission"
## [13] "Price"               "Style/Trim"          "Brakes"
## [16] "Mfg Exterior Colour" "Tires"               "Engine (HP)"
## [19] "Exterior Colour"

我们可以从中计算出新的暂定lcis

## examine lcis for ordering
lcis <- match(names(testlist),cns);
lcis;
##   [1]  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12
##  [26] 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11
##  [51] 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10
##  [76] 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8  9
## [101] 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7  8
## [126]  9 10 11 12 13  1  2  3  4  5  6  7  8  9 10 11 12 13  1  2  3  4  5  6  7
## [151]  8  9 10 11 12 13  1  2  3  4 14 13  1  2  3  4  5  6  7  8  9 10 11 12 13
## [176]  1  2  3  4  5 15 16  6  8  9 10 17 11 18 12 19 13  1  2  3  4  5 15 16  6
## [201]  8  9 10 17 11 18 12 19 13

仔细观察上面的矢量,我们可以看到它始于1:13的许多常规重复。事实上,只有在矢量结束时它才会变得不规则,我们看到14然后是13,16然后是6,10-11-12与17-18-19交错等等。

但是我们可以在这里做出的一个重要观察是,向量似乎由1和13描述的组构成。换句话说,对于所有似乎有规律性的范围(即使也有一些不规则性),它们似乎从1开始,以13结束。这一观察结果与您对车辆数据中间无序的评论一致。我们称之为1/13假设。

我们可以通过分割这个1/13边界来获得更清晰的群体视图:

## recognizing 1/13 consistency, split on it to see how each (possible) row looks under this assumption
split(lcis,cumsum(lcis==1L));
## $`1`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`2`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`3`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`4`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`5`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`6`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`7`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`8`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`9`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`10`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`11`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`12`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`13`
## [1]  1  2  3  4 14 13
##
## $`14`
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
##
## $`15`
##  [1]  1  2  3  4  5 15 16  6  8  9 10 17 11 18 12 19 13
##
## $`16`
##  [1]  1  2  3  4  5 15 16  6  8  9 10 17 11 18 12 19 13

现在,如果您在上述群组中仔细查看非常,您可以发现可以重新排序cns,以便所有群组都按升序排序。它们不是连续的,但我为原始问题设计的解决方案不需要连续性;所有必要的是升序。

例如,我们需要在13之前订购第14列,我们需要在6,8,9等之前订购第15和第16列:

## recognizing the possibility of reordering to achieve perfect within-row ascending order, reorder cns to cns2
cns2 <- cns[c(1,2,3,4,14,5,15,16,6,7,8,9,10,17,11,18,12,19,13)];
cns2;
##  [1] "Status"              "Make"                "Model"
##  [4] "Kilometres"          "Style/Trim"          "Stock Number"
##  [7] "Brakes"              "Mfg Exterior Colour" "Engine"
## [10] "Number of Hours"     "Front axle"          "Rear axle"
## [13] "Suspension"          "Tires"               "Wheelbase"
## [16] "Engine (HP)"         "Transmission"        "Exterior Colour"
## [19] "Price"

现在我们可以重新计算lcis,我现在称之为lcis2,并展示新的群组订单:

## calculate lcis2 from cns2, and prove that we've successfully ordered each individual row under the 1/13 (now 1/19) break assumption
lcis2 <- match(names(testlist),cns2);
split(lcis2,cumsum(lcis2==1L));
## $`1`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`2`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`3`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`4`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`5`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`6`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`7`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`8`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`9`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`10`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`11`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`12`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`13`
## [1]  1  2  3  4  5 19
##
## $`14`
##  [1]  1  2  3  4  6  9 10 11 12 13 15 17 19
##
## $`15`
##  [1]  1  2  3  4  6  7  8  9 11 12 13 14 15 16 17 18 19
##
## $`16`
##  [1]  1  2  3  4  6  7  8  9 11 12 13 14 15 16 17 18 19

最后,我们可以运行整个解决方案,现在小心使用带有2个后缀的变量名称:

## now we can apply the preallocate/fill-in solution using cns2 and lcis2
## will use lris2 and df2 just to be consistent
lris2 <- c(1L,cumsum(diff(lcis2)<=0L)+1L);
df2 <- as.data.frame(testlist[match(1:length(cns2),lcis2)],stringsAsFactors=F)[0,];
df2[max(lris2),] <- NA;
df2;
##    Status Make Model Kilometres Style.Trim Stock.Number Brakes Mfg.Exterior.Colour Engine Number.of.Hours Front.axle Rear.axle Suspension Tires Wheelbase Engine..HP. Transmission Exterior.Colour Price
## 1    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 2    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 3    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 4    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 5    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 6    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 7    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 8    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 9    <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 10   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 11   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 12   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 13   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 14   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 15   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
## 16   <NA> <NA>  <NA>       <NA>       <NA>         <NA>   <NA>                <NA>   <NA>            <NA>       <NA>      <NA>       <NA>  <NA>      <NA>        <NA>         <NA>            <NA>  <NA>
for (ci in 1:length(cns2)) { m <- lcis2==ci; df2[lris2[m],ci] <- do.call(c,testlist[m]); };
df2;
##    Status          Make                                          Model Kilometres    Style.Trim Stock.Number Brakes Mfg.Exterior.Colour                  Engine Number.of.Hours                     Front.axle                      Rear.axle                     Suspension    Tires Wheelbase Engine..HP.                   Transmission Exterior.Colour    Price
## 1     New     Peterbilt                 367 Tri-Drive c/w 58'' Sleeper   3,360 km          <NA>        12949   <NA>                <NA> Cummins ISX15  (550 hp)              44  Dana Spicer D2000  (20,000lb) Dana T69-170    (wide track) t Peterbilt Air-Trak  (66,000lb)     <NA>     267''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $217,770
## 2     New      Kenworth                               T800 T/A Tractor  82,230 km          <NA>        10720   <NA>                <NA>   Cummins ISX15 (550hp)           2,712 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     <NA>     244''        <NA> Fuller 18 spd main AT1202 2 sp            <NA> $199,500
## 3     New      Kenworth            T800 Tandem Tractor w/ 38'' Sleeper  98,521 km          <NA>        10722   <NA>                <NA>   Cummins ISX15 (550hp)           2,790 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     <NA>     244''        <NA> Fuller 18 spd main AT1202 2 sp            <NA> $199,500
## 4    Used      Kenworth           W900 Tri-Drive Sleeper Truck Tractor 170,422 km          <NA>        13227   <NA>                <NA> Cummins ISX15  (600 hp)           4,925 Meritor FL941      (20,000 lb)  Meritor RZ-166    (69,000 lb)  Kenworth AG690 (69,000lb) Air     <NA>     259''        <NA> 18 speed main &     4 speed au            <NA> $197,750
## 5     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,367 km          <NA>        12180   <NA>                <NA>  Cummins ISX15  (550hp)              38 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $193,300
## 6     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,421 km          <NA>        12179   <NA>                <NA>  Cummins ISX15  (550hp)              46 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $193,300
## 7     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   2,157 km          <NA>        12181   <NA>                <NA>  Cummins ISX15  (550hp)              64 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 8     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,444 km          <NA>        12954   <NA>                <NA>  Cummins ISX15  (550hp)              45 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 9     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,427 km          <NA>        12955   <NA>                <NA>  Cummins ISX15  (550hp)              43 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 10    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,982 km          <NA>        12182   <NA>                <NA>  Cummins ISX15  (550hp)              78 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 11    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  23,293 km          <NA>        12953   <NA>                <NA>  Cummins ISX15  (550hp)             394 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $189,880
## 12    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  27,215 km          <NA>        12509   <NA>                <NA>  Cummins ISX15  (550hp)             458 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $186,600
## 13   Used         Volvo                                 VNL64T 780-730  72,000 km VNL64T780-730         <NA>   <NA>                <NA>                    <NA>            <NA>                           <NA>                           <NA>                           <NA>     <NA>      <NA>        <NA>                           <NA>            <NA> $185,000
## 14    New     Peterbilt 367 T/A Wet Kit Tractor c/w       58'' Sleeper  60,657 km          <NA>        10838   <NA>                <NA>  Cummins ISX15  (550hp)           1,822 Dana Spicer E14621  (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak  (46,000lb)     <NA>     244''        <NA>  RTLO18918B  Fuller (18 speed)            <NA> $171,800
## 15   Used International                                   ProStar +122  36,236 km          <NA>       463555    Air               White             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 40000 lb                     Int'l IROS  11R22.5    228 in         450      Eaton Fuller D/O (18 spd)           White $168,750
## 16   Used International                                   ProStar +122  33,000 km          <NA>       463543    Air               White             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 46000 lb                     Int'l IROS 11R/22.5    236 in         475      Eaton Fuller D/O (18 spd)           White $165,900

现在,我意识到可能最好完全从“升序假设”(让我们称之为)转移到1/13假设,我们可以通过更改lris来做到这一点。计算。这将使我们无需根据我们从cns电话中收到的订单重新排序unique()

下面我演示这个,回过头来看非常有用的未填充的变量名,稍后会看到:

## change lris calculation to depend directly on 1/13 assumption; don't bother reordering
cns <- unique(names(testlist));
lcis <- match(names(testlist),cns);
lris <- c(1L,cumsum(lcis[-1]==1L)+1L);
df <- as.data.frame(testlist[match(1:length(cns),lcis)],stringsAsFactors=F)[0,];
df[max(lris),] <- NA;
for (ci in 1:length(cns)) { m <- lcis==ci; df[lris[m],ci] <- do.call(c,testlist[m]); };
df;
##    Status          Make                                          Model Kilometres Stock.Number                  Engine Number.of.Hours                     Front.axle                      Rear.axle                     Suspension Wheelbase                   Transmission    Price    Style.Trim Brakes Mfg.Exterior.Colour    Tires Engine..HP. Exterior.Colour
## 1     New     Peterbilt                 367 Tri-Drive c/w 58'' Sleeper   3,360 km        12949 Cummins ISX15  (550 hp)              44  Dana Spicer D2000  (20,000lb) Dana T69-170    (wide track) t Peterbilt Air-Trak  (66,000lb)     267''  RTLO18918B  Fuller (18 speed) $217,770          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 2     New      Kenworth                               T800 T/A Tractor  82,230 km        10720   Cummins ISX15 (550hp)           2,712 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     244'' Fuller 18 spd main AT1202 2 sp $199,500          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 3     New      Kenworth            T800 Tandem Tractor w/ 38'' Sleeper  98,521 km        10722   Cummins ISX15 (550hp)           2,790 Dana Spicer D2000  (20,000 lb) Dana D46-170HPW (46,000 lb) ta Neway ADZ252    (52,000lb) Air     244'' Fuller 18 spd main AT1202 2 sp $199,500          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 4    Used      Kenworth           W900 Tri-Drive Sleeper Truck Tractor 170,422 km        13227 Cummins ISX15  (600 hp)           4,925 Meritor FL941      (20,000 lb)  Meritor RZ-166    (69,000 lb)  Kenworth AG690 (69,000lb) Air     259'' 18 speed main &     4 speed au $197,750          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 5     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,367 km        12180  Cummins ISX15  (550hp)              38 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $193,300          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 6     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,421 km        12179  Cummins ISX15  (550hp)              46 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $193,300          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 7     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   2,157 km        12181  Cummins ISX15  (550hp)              64 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 8     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,444 km        12954  Cummins ISX15  (550hp)              45 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 9     New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,427 km        12955  Cummins ISX15  (550hp)              43 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 10    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper   3,982 km        12182  Cummins ISX15  (550hp)              78 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 11    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  23,293 km        12953  Cummins ISX15  (550hp)             394 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $189,880          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 12    New     Peterbilt       367 T/A Wet-Kit Tractor c/w 58'' Sleeper  27,215 km        12509  Cummins ISX15  (550hp)             458 Dana Spicer E14621  (14,600 lb Dana D46-170     (46,000lb) ta Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $186,600          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 13   Used         Volvo                                 VNL64T 780-730  72,000 km         <NA>                    <NA>            <NA>                           <NA>                           <NA>                           <NA>      <NA>                           <NA> $185,000 VNL64T780-730   <NA>                <NA>     <NA>        <NA>            <NA>
## 14    New     Peterbilt 367 T/A Wet Kit Tractor c/w       58'' Sleeper  60,657 km        10838  Cummins ISX15  (550hp)           1,822 Dana Spicer E14621  (14,600 lb Dana D46-170HP (46,000lb) tand Peterbilt Air-Trak  (46,000lb)     244''  RTLO18918B  Fuller (18 speed) $171,800          <NA>   <NA>                <NA>     <NA>        <NA>            <NA>
## 15   Used International                                   ProStar +122  36,236 km       463555             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 40000 lb                     Int'l IROS    228 in      Eaton Fuller D/O (18 spd) $168,750          <NA>    Air               White  11R22.5         450           White
## 16   Used International                                   ProStar +122  33,000 km       463543             Cummins ISX            <NA>         Arvin Meritor 13200 lb         Arvin Meritor 46000 lb                     Int'l IROS    236 in      Eaton Fuller D/O (18 spd) $165,900          <NA>    Air               White 11R/22.5         475           White

如您所见,df的列顺序与df2不同,但我们可以证明数据与以下内容相同:

## prove df2 and df are identical, ignoring the column order difference
identical(df,df2[names(df)]);
## [1] TRUE

答案 1 :(得分:5)

我能提出的最佳解决方案

library(data.table)
listnames <- names(testlist) 
# "Color" "HP"    "Type"  "HP"    "Type"  "Color" "HP"    "Type" 

unames <- unique(listnames)
# "Color" "HP"    "Type"

a <- setNames(1:length(unames), unames)
# Color    HP  Type 
# 1     2     3 

d <- unname(a[listnames])
# [1] 1 2 3 2 3 1 2 3

splitted_list <- split(testlist, cumsum(shift(d, fill=0)>d))
# results in testlist splitted by increasing sequences in d
# (1,2,3), (2,3), (1, 2, 3)
# You can impose a different splitting condition here, for instance, 
# if each entry begins with 1, then cumsum(d==1) is adequate 

# and the last step is pretty much self explanatory
rbindlist(lapply(splitted_list, data.frame), fill=TRUE) 
#    Color  HP  Type
# 1:  Blue 405 Truck
# 2:    NA 400   Car
# 3: White 500 Truck

希望它解决您的问题。

从分裂条件为cumsum(d==1)的Dropbox应用于您的测试数据时,结果为

structure(list(Status = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("New", "Used"
), class = "factor"), Make = structure(c(1L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 4L, 4L), .Label = c("Peterbilt", 
"Kenworth", "Volvo", "International"), class = "factor"), Model = structure(c(1L, 
2L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 8L), .Label = c("367 Tri-Drive c/w 58'' Sleeper", 
"T800 T/A Tractor", "T800 Tandem Tractor w/ 38'' Sleeper", "W900 Tri-Drive Sleeper Truck Tractor", 
"367 T/A Wet-Kit Tractor c/w 58'' Sleeper", "VNL64T 780-730", 
"367 T/A Wet Kit Tractor c/w       58'' Sleeper", "ProStar +122"
), class = "factor"), Kilometres = structure(1:16, .Label = c("3,360 km", 
"82,230 km", "98,521 km", "170,422 km", "3,367 km", "3,421 km", 
"2,157 km", "3,444 km", "3,427 km", "3,982 km", "23,293 km", 
"27,215 km", "72,000 km", "60,657 km", "36,236 km", "33,000 km"
), class = "factor"), Stock.Number = structure(c(1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, 14L, 15L), .Label = c("12949", 
"10720", "10722", "13227", "12180", "12179", "12181", "12954", 
"12955", "12182", "12953", "12509", "10838", "463555", "463543"
), class = "factor"), Engine = structure(c(1L, 2L, 2L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Cummins ISX15  (550 hp)", 
"Cummins ISX15 (550hp)", "Cummins ISX15  (600 hp)", "Cummins ISX15  (550hp)", 
"Cummins ISX"), class = "factor"), Number.of.Hours = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, NA, 13L, NA, NA
), .Label = c("44", "2,712", "2,790", "4,925", "38", "46", "64", 
"45", "43", "78", "394", "458", "1,822"), class = "factor"), 
    Front.axle = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Dana Spicer D2000  (20,000lb)", 
    "Dana Spicer D2000  (20,000 lb)", "Meritor FL941      (20,000 lb)", 
    "Dana Spicer E14621  (14,600 lb", "Arvin Meritor 13200 lb"
    ), class = "factor"), Rear.axle = structure(c(1L, 2L, 2L, 
    3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, NA, 5L, 6L, 7L), .Label = c("Dana T69-170    (wide track) t", 
    "Dana D46-170HPW (46,000 lb) ta", "Meritor RZ-166    (69,000 lb)", 
    "Dana D46-170     (46,000lb) ta", "Dana D46-170HP (46,000lb) tand", 
    "Arvin Meritor 40000 lb", "Arvin Meritor 46000 lb"), class = "factor"), 
    Suspension = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, NA, 4L, 5L, 5L), .Label = c("Peterbilt Air-Trak  (66,000lb)", 
    "Neway ADZ252    (52,000lb) Air", "Kenworth AG690 (69,000lb) Air", 
    "Peterbilt Air-Trak  (46,000lb)", "Int'l IROS"), class = "factor"), 
    Wheelbase = structure(c(1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, NA, 2L, 4L, 5L), .Label = c("267''", "244''", 
    "259''", "228 in", "236 in"), class = "factor"), Transmission = structure(c(1L, 
    2L, 2L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 4L, 4L
    ), .Label = c("RTLO18918B  Fuller (18 speed)", "Fuller 18 spd main AT1202 2 sp", 
    "18 speed main &     4 speed au", "Eaton Fuller D/O (18 spd)"
    ), class = "factor"), Price = structure(c(1L, 2L, 2L, 3L, 
    4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("$217,770", 
    "$199,500", "$197,750", "$193,300", "$189,880", "$186,600", 
    "$185,000", "$171,800", "$168,750", "$165,900"), class = "factor"), 
    Style.Trim = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, 1L, NA, NA, NA), .Label = "VNL64T780-730", class = "factor"), 
    Brakes = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 1L, 1L), .Label = "Air", class = "factor"), 
    Mfg.Exterior.Colour = structure(c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L), .Label = "White", class = "factor"), 
    Tires = structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, 1L, 2L), .Label = c("11R22.5", "11R/22.5"
    ), class = "factor"), Engine..HP. = structure(c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 2L), .Label = c("450", 
    "475"), class = "factor"), Exterior.Colour = structure(c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L
    ), .Label = "White", class = "factor")), .Names = c("Status", 
"Make", "Model", "Kilometres", "Stock.Number", "Engine", "Number.of.Hours", 
"Front.axle", "Rear.axle", "Suspension", "Wheelbase", "Transmission", 
"Price", "Style.Trim", "Brakes", "Mfg.Exterior.Colour", "Tires", 
"Engine..HP.", "Exterior.Colour"), row.names = c(NA, -16L), class = "data.frame")

答案 2 :(得分:3)

可能不是最好的解决方案,因为它使用了while循环。但是,使用tidyr或您喜欢的其他重塑包。

testlist <- c("Blue", "405", "Truck", "400", "Car", "White", "500", "Truck")
testnames <- c("Color", "HP", "Type", "HP", "Type", "Color", "HP", "Type")

df <- data.frame(names = testnames, attributes = testlist, stringsAsFactors = FALSE)



# need to count number of vehicles inside data frame

# initialise while loop counters
df_index = 1
vehicle_index = vector(mode = "integer", length = nrow(df))
vehicle_count = 1

# now loop through the data frame to find attributes 
# which belong to vehicle 1, 2, 3, etc...
while(df_index <= nrow(df)){
    if (sum(c("Color", "HP", "Type") == df$names[df_index:(df_index+2)]) == 3) {
        vehicle_index[df_index:(df_index+2)] <- vehicle_count
        df_index = df_index + 3
        vehicle_count = vehicle_count + 1
    } else if (sum(c("Color", "HP", "Type") %in% df$names[df_index:(df_index+1)]) == 2) {
        vehicle_index[df_index:(df_index+1)] <- vehicle_count
        df_index = df_index + 2
        vehicle_count = vehicle_count + 1
    } else {
        vehicle_index[df_index:(df_index)] <- vehicle_count
        df_index = df_index + 1
        vehicle_count = vehicle_count + 1
    }

}

# finally, label the vehicle attributes with the vehicle number,
# and spread the data.
df_final <- data.frame(df, vehicle_index = vehicle_index)

tidyr::spread(df_final, key = "names", value = "attributes")