我想进行关联分析,但是需要将我的数据框设置为正确的格式,该格式仅显示交易。 1)如何将“子类别”列乘以“数量”列的数量?
2)如何按订单ID对交易进行分组?
我有这个df:
`Order ID` `Sub-Category` `Quantity`
<chr> <chr> <dbl>
1 CA-2017-152156 Bookcases 2
2 CA-2017-152156 Chairs 3
3 CA-2017-138688 Labels 2
1)我想要这个:
`Order ID` `Sub-Category` `Sub-Category2` `Sub-Category3`
<chr> <chr> <chr> <chr>
1 CA-2017-152156 Bookcases Bookcases NULL
2 CA-2017-152156 Chairs Chairs Chairs
3 CA-2017-138688 Labels Labels NULL
(此后,我想合并相同的订单ID。例如第1行和第2行。您对此有何提示?) 谢谢!
答案 0 :(得分:1)
以下对第1点的回答。
Max <- max(df1$Quantity)
res <- lapply(seq_len(nrow(df1)), function(i){
c(rep(as.character(df1[i, 2]), df1[i, 3]), rep(NA, Max - df1[i, 3]))
})
res <- cbind(df1[1], do.call(rbind, res))
names(res)[-1] <- paste0(names(df1)[2], names(res)[-1])
res
# Order ID Sub-Category1 Sub-Category2 Sub-Category3
#1 CA-2017-152156 Bookcases Bookcases <NA>
#2 CA-2017-152156 Chairs Chairs Chairs
#3 CA-2017-138688 Labels Labels <NA>
dput
格式的数据。
df1 <-
structure(list(`Order ID` = structure(c(2L, 2L, 1L),
.Label = c("CA-2017-138688", "CA-2017-152156"),
class = "factor"), `Sub-Category` = structure(1:3,
.Label = c("Bookcases", "Chairs", "Labels"), class =
"factor"), Quantity = c(2L, 3L, 2L)), class = "data.frame",
row.names = c("1", "2", "3"))
答案 1 :(得分:1)
要使用tidyverse
来回答问题1),一种方法是每rep
Sub-Category
次创建Quantity
一次吃一次新列并将其存储为一个字符串以逗号分隔的格式,然后将它们separate
分成n
列。
library(tidyverse)
n <- max(df$Quantity)
df1 <- df %>%
mutate(new = map2_chr(`Sub-Category`, Quantity, ~paste(rep(.x, .y), collapse = ","))) %>%
separate(new, paste("Sub-Category", seq_len(n))) %>%
select(-`Sub-Category`)
df1
# Order ID Quantity Sub-Category 1 Sub-Category 2 Sub-Category 3
#1 CA-2017-152156 2 Bookcases Bookcases <NA>
#2 CA-2017-152156 3 Chairs Chairs Chairs
#3 CA-2017-138688 2 Labels Labels <NA>
关于问题2),我不是100%清楚您要寻找的内容(因为没有预期的输出),但是我认为您正在寻找group_by
Order ID
并折叠类别每个组排成一排?
df1 %>%
group_by(`Order ID`) %>%
summarise_at(vars(starts_with("Sub")), list(~paste(na.omit(.), collapse = ",")))
# A tibble: 2 x 4
# `Order ID` `Sub-Category 1` `Sub-Category 2` `Sub-Category 3`
# <fct> <chr> <chr> <chr>
#1 CA-2017-138688 Labels Labels ""
#2 CA-2017-152156 Bookcases,Chairs Bookcases,Chairs Chairs