我想转换以下数据集:
players[6 + 1]
理想的数据集是:
transaction_id productsku
1 SK0001
1 SK0002
2 AB0001
2 AC0001
2 AC0002
3 BC0001
4 BC0002
所以,我使用以下代码进行转换,但失败了。
transaction_id x1 x2 x3
1 SK0001 SK0002
2 AB0001 AC0001 AC0002
3 BC0001
4 BC0002
答案 0 :(得分:1)
尝试根据ToArray
进行拆分,然后为每个组获取transation_id
。然后,您可以productsku
列表,同时对列表中的每个元素进行子集化,以便能够包含最大数量为rbind
的元素。
productsku
<小时/> 数据强>
L = lapply(split(df, df$transaction_id), function(a) a$productsku)
max_length = max(lengths(L))
do.call(rbind, lapply(L, function(a) a[1:max_length]))
# [,1] [,2] [,3]
#1 "SK0001" "SK0002" NA
#2 "AB0001" "AC0001" "AC0002"
#3 "BC0001" NA NA
#4 "BC0002" NA NA
答案 1 :(得分:0)
这是一种方式。我们的想法是将变量组合在同一个组中,然后使用separate
将它们分成不同的列:
library(tidyverse)
df %>%
group_by(transaction_id) %>%
summarise(product=paste(productsku, collapse=", ")) %>%
separate(product, c("x1", "x2", "x3"), sep=", ")
# A tibble: 4 × 4
transaction_id x1 x2 x3
* <int> <chr> <chr> <chr>
1 1 SK0001 SK0002 <NA>
2 2 AB0001 AC0001 AC0002
3 3 BC0001 <NA> <NA>
4 4 BC0002 <NA> <NA>
Warning message:
Too few values at 3 locations: 1, 3, 4
答案 2 :(得分:0)
在两个步骤中使用data.table
的简单而快速的替代方案
library(data.table)
# convert mydata into a data.table
setDT(mydata)
# step 1: gather productsku values by transaction id
temp <- df[, .(product = toString(productsku)), by = list(transaction_id)]
# step 2: separate productsku values in different columns
temp[, c("x1", "x2", "x3") := tstrsplit(product, ",", fill="")] # you can also use fill=NA
temp
#> transaction_id product x1 x2 x3
#> 1: 1 SK0001, SK0002 SK0001 SK0002
#> 2: 2 AB0001, AC0001, AC0002 AB0001 AC0001 AC0002
#> 3: 3 BC0001 BC0001
#> 4: 4 BC0002 BC0002
使用dcast{data.table}
的另一个快速替代方案,输出略有不同:
# Using dcast
dcast(df, transaction_id~productsku)
#> transaction_id AB0001 AC0001 AC0002 BC0001 BC0002 SK0001 SK0002
#> 1: 1 NA NA NA NA NA SK0001 SK0002
#> 2: 2 AB0001 AC0001 AC0002 NA NA NA NA
#> 3: 3 NA NA NA BC0001 NA NA NA
#> 4: 4 NA NA NA NA BC0002 NA NA