是否有R函数通过乘以另一列的值来添加列?

时间:2019-05-11 10:21:20

标签: r dataframe market-basket-analysis

我想进行关联分析,但是需要将我的数据框设置为正确的格式,该格式仅显示交易。 1)如何将“子类别”列乘以“数量”列的数量?

2)如何按订单ID对交易进行分组?

我有这个df:

 `Order ID`        `Sub-Category` `Quantity`
  <chr>              <chr>             <dbl>

1 CA-2017-152156    Bookcases             2

2 CA-2017-152156    Chairs                3

3 CA-2017-138688    Labels                2

1)我想要这个:

  `Order ID`     `Sub-Category` `Sub-Category2`   `Sub-Category3`
  <chr>          <chr>             <chr>           <chr>

1 CA-2017-152156 Bookcases        Bookcases         NULL

2 CA-2017-152156 Chairs             Chairs          Chairs

3 CA-2017-138688 Labels            Labels           NULL

(此后,我想合并相同的订单ID。例如第1行和第2行。您对此有何提示?) 谢谢!

2 个答案:

答案 0 :(得分:1)

以下对第1点的回答。

Max <- max(df1$Quantity)
res <- lapply(seq_len(nrow(df1)), function(i){
  c(rep(as.character(df1[i, 2]), df1[i, 3]), rep(NA, Max - df1[i, 3]))
})
res <- cbind(df1[1], do.call(rbind, res))
names(res)[-1] <- paste0(names(df1)[2], names(res)[-1])

res
#        Order ID Sub-Category1 Sub-Category2 Sub-Category3
#1 CA-2017-152156     Bookcases     Bookcases          <NA>
#2 CA-2017-152156        Chairs        Chairs        Chairs
#3 CA-2017-138688        Labels        Labels          <NA>

dput格式的数据。

df1 <-
structure(list(`Order ID` = structure(c(2L, 2L, 1L), 
.Label = c("CA-2017-138688", "CA-2017-152156"), 
class = "factor"), `Sub-Category` = structure(1:3, 
.Label = c("Bookcases", "Chairs", "Labels"), class = 
"factor"), Quantity = c(2L, 3L, 2L)), class = "data.frame", 
row.names = c("1", "2", "3"))

答案 1 :(得分:1)

要使用tidyverse来回答问题1),一种方法是每rep Sub-Category次创建Quantity一次吃一次新列并将其存储为一个字符串以逗号分隔的格式,然后将它们separate分成n列。

library(tidyverse)

n <- max(df$Quantity)

df1 <- df %>%
         mutate(new = map2_chr(`Sub-Category`, Quantity, ~paste(rep(.x, .y), collapse = ","))) %>%
         separate(new, paste("Sub-Category", seq_len(n))) %>%
         select(-`Sub-Category`)

df1

#       Order ID  Quantity Sub-Category 1 Sub-Category 2 Sub-Category 3
#1 CA-2017-152156        2      Bookcases      Bookcases           <NA>
#2 CA-2017-152156        3         Chairs         Chairs         Chairs
#3 CA-2017-138688        2         Labels         Labels           <NA>

关于问题2),我不是100%清楚您要寻找的内容(因为没有预期的输出),但是我认为您正在寻找group_by Order ID并折叠类别每个组排成一排?

df1 %>%
  group_by(`Order ID`) %>%
  summarise_at(vars(starts_with("Sub")), list(~paste(na.omit(.), collapse = ",")))

# A tibble: 2 x 4
#  `Order ID`   `Sub-Category 1` `Sub-Category 2` `Sub-Category 3`
#  <fct>          <chr>            <chr>            <chr>           
#1 CA-2017-138688 Labels           Labels           ""              
#2 CA-2017-152156 Bookcases,Chairs Bookcases,Chairs Chairs