我正在尝试转换这样的表:
# A tibble: 10 x 2
user_id pred
<int> <fctr>
1 27 electronics
2 27 home
3 38 health
4 60 electronics
5 60 beauty
6 92 home
7 92 electronics
8 106 health
9 117 home
10 117 women
看起来像这样:
# A tibble: 6 x 3
user_id pred_1 pred_2
<dbl> <chr> <chr>
1 27 electronics home
2 38 health NA
3 60 electronics beauty
4 92 home electronics
5 106 health NA
6 117 home women
即。每user_id
行一行,pred
列扩展为pred_1
,pred_2
等等。有什么想法吗?
更新
最初的问题已经解决了。跟进:
使用tidyr::spread
方法,是否可以将group_size
限制为N
,以便在展开时,每个组最多需要N
个值?
答案 0 :(得分:2)
我们通过&#39; user_id&#39;分组后创建一个序列列。然后spread
来自&#39; long&#39;广泛的&#39;。
library(dplyr)
library(tidyr)
df1 %>%
group_by(user_id) %>%
mutate(id = paste0("pred_", row_number()),
id = factor(id, levels = unique(id))) %>%
spread(id, pred)
# user_id pred_1 pred_2
# <int> <chr> <chr>
#1 27 electronics home
#2 38 health <NA>
#3 60 electronics beauty
#4 92 home electronics
#5 106 health <NA>
#6 117 home women
或者我们可以使用dcast
data.table
library(data.table)#1.9.7+
dcast(setDT(df1), user_id~paste0("pred_", rowid(user_id)), value.var = "pred")