我有一个购物车数据,看起来像下面的示例数据框:
sample_df<-data.frame(
clientid=1:10,
ProductA=c("chair","table","plate","plate","table","chair","table","plate","chair","chair"),
QuantityA=c(1,2,1,1,1,1,2,3,1,2),
ProductB=c("table","doll","shoes","","door","","computer","computer","","plate"),
QuantityB=c(3,1,2,"",2,"",1,1,"",1)
)
#sample data frame
clientid ProductA QuantityA ProductB QuantityB
1 1 chair 1 table 3
2 2 table 2 doll 1
3 3 plate 1 shoes 2
4 4 plate 1
...
10 10 chair 2 plate 1
我想将其转换为不同的格式,如:
#ideal data frame
clientid ProductNumber Product Quantity
1 1 A chair 1
2 1 B table 3
3 2 A table 2
4 2 B doll 1
...
11 6 A chair 1
...
17 10 A chair 2
18 10 B plate 1
我试过了
library(tidyr)
sample_df_gather<- sample_df %>% select(clientid, ProductA, ProductB)
%>% gather(ProductNumber, value, -clientid) %>% filter(!is.na(value))
#this gives me
clientid ProductNumber value
1 1 ProductA chair
2 2 ProductB table
3 3 ProductA plate
4 4 ProductB plate
...
但是,我不知道如何向数据框添加数量。此外,在实际数据框中,还有更多列,例如标题,价格,我也希望将其转换为理想数据框。有没有办法将数据转换为理想格式?
答案 0 :(得分:6)
使用data.table:
library(data.table)
res = melt(setDT(sample_df),
measure.vars = patterns("^Product", "^Quantity"),
variable.name = "ProductNumber")
res[, ProductNumber := factor(ProductNumber, labels = c("A","B"))]
给出了
clientid ProductNumber value1 value2
1: 1 A chair 1
2: 2 A table 2
3: 3 A plate 1
4: 4 A plate 1
5: 5 A table 1
6: 6 A chair 1
7: 7 A table 2
8: 8 A plate 3
9: 9 A chair 1
10: 10 A chair 2
11: 1 B table 3
12: 2 B doll 1
13: 3 B shoes 2
14: 4 B NA NA
15: 5 B door 2
16: 6 B NA NA
17: 7 B computer 1
18: 8 B computer 1
19: 9 B NA NA
20: 10 B plate 1
数据(因为OP的原始数据被禁止):
structure(list(clientid = 1:10, ProductA = structure(c(1L, 3L,
2L, 2L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("chair", "plate",
"table"), class = "factor"), QuantityA = c(1L, 2L, 1L, 1L, 1L,
1L, 2L, 3L, 1L, 2L), ProductB = structure(c(6L, 2L, 5L, NA, 3L,
NA, 1L, 1L, NA, 4L), .Label = c("computer", "doll", "door", "plate",
"shoes", "table"), class = "factor"), QuantityB = c(3L, 1L, 2L,
NA, 2L, NA, 1L, 1L, NA, 1L)), .Names = c("clientid", "ProductA",
"QuantityA", "ProductB", "QuantityB"), row.names = c(NA, -10L
), class = "data.frame")