用tidyr收集多个列

时间:2016-04-21 14:52:34

标签: r dplyr reshape tidyr

我有一个购物车数据,看起来像下面的示例数据框:

sample_df<-data.frame(
   clientid=1:10,
   ProductA=c("chair","table","plate","plate","table","chair","table","plate","chair","chair"),
   QuantityA=c(1,2,1,1,1,1,2,3,1,2),
   ProductB=c("table","doll","shoes","","door","","computer","computer","","plate"),
   QuantityB=c(3,1,2,"",2,"",1,1,"",1)
)
#sample data frame
   clientid ProductA QuantityA ProductB QuantityB
1  1        chair    1         table   3
2  2        table    2         doll    1   
3  3        plate    1         shoes   2             
4  4        plate    1             
...
10 10       chair    2         plate   1

我想将其转换为不同的格式,如:

#ideal data frame
   clientid ProductNumber Product Quantity
1  1        A             chair   1
2  1        B             table   3
3  2        A             table   2
4  2        B             doll    1
...
11 6        A             chair   1
...
17 10       A             chair   2
18 10       B             plate   1 

我试过了

library(tidyr)
sample_df_gather<- sample_df %>% select(clientid, ProductA, ProductB) 
%>% gather(ProductNumber, value, -clientid) %>% filter(!is.na(value))

#this gives me
    clientid ProductNumber value
1   1        ProductA      chair
2   2        ProductB      table
3   3        ProductA      plate
4   4        ProductB      plate
...

但是,我不知道如何向数据框添加数量。此外,在实际数据框中,还有更多列,例如标题,价格,我也希望将其转换为理想数据框。有没有办法将数据转换为理想格式?

1 个答案:

答案 0 :(得分:6)

使用data.table:

library(data.table)
res = melt(setDT(sample_df), 
  measure.vars = patterns("^Product", "^Quantity"), 
  variable.name = "ProductNumber")
res[, ProductNumber := factor(ProductNumber, labels = c("A","B"))]

给出了

    clientid ProductNumber   value1 value2
 1:        1             A    chair      1
 2:        2             A    table      2
 3:        3             A    plate      1
 4:        4             A    plate      1
 5:        5             A    table      1
 6:        6             A    chair      1
 7:        7             A    table      2
 8:        8             A    plate      3
 9:        9             A    chair      1
10:       10             A    chair      2
11:        1             B    table      3
12:        2             B     doll      1
13:        3             B    shoes      2
14:        4             B       NA     NA
15:        5             B     door      2
16:        6             B       NA     NA
17:        7             B computer      1
18:        8             B computer      1
19:        9             B       NA     NA
20:       10             B    plate      1

数据(因为OP的原始数据被禁止):

structure(list(clientid = 1:10, ProductA = structure(c(1L, 3L, 
2L, 2L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("chair", "plate", 
"table"), class = "factor"), QuantityA = c(1L, 2L, 1L, 1L, 1L, 
1L, 2L, 3L, 1L, 2L), ProductB = structure(c(6L, 2L, 5L, NA, 3L, 
NA, 1L, 1L, NA, 4L), .Label = c("computer", "doll", "door", "plate", 
"shoes", "table"), class = "factor"), QuantityB = c(3L, 1L, 2L, 
NA, 2L, NA, 1L, 1L, NA, 1L)), .Names = c("clientid", "ProductA", 
"QuantityA", "ProductB", "QuantityB"), row.names = c(NA, -10L
), class = "data.frame")