根据另一列的分组扩展列(dplyr / tidyr)

时间:2016-08-01 18:48:30

标签: r dplyr reshape2 tidyr

我正在尝试转换这样的表:

# A tibble: 10 x 2
   user_id        pred
     <int>      <fctr>
1       27 electronics
2       27        home
3       38      health
4       60 electronics
5       60      beauty
6       92        home
7       92 electronics
8      106      health
9      117        home
10     117       women

看起来像这样:

# A tibble: 6 x 3
  user_id      pred_1      pred_2
    <dbl>       <chr>       <chr>
1      27 electronics        home
2      38      health          NA
3      60 electronics      beauty
4      92        home electronics
5     106      health          NA
6     117        home       women

即。每user_id行一行,pred列扩展为pred_1pred_2等等。有什么想法吗?

更新

最初的问题已经解决了。跟进:

使用tidyr::spread方法,是否可以将group_size限制为N,以便在展开时,每个组最多需要N个值?

1 个答案:

答案 0 :(得分:2)

我们通过&#39; user_id&#39;分组后创建一个序列列。然后spread来自&#39; long&#39;广泛的&#39;。

library(dplyr)
library(tidyr)
df1 %>%
     group_by(user_id) %>%
     mutate(id = paste0("pred_", row_number()), 
             id = factor(id, levels = unique(id))) %>%
     spread(id, pred)
#    user_id      pred_1      pred_2
#     <int>       <chr>       <chr>
#1      27 electronics        home
#2      38      health        <NA>
#3      60 electronics      beauty
#4      92        home electronics
#5     106      health        <NA>
#6     117        home       women

或者我们可以使用dcast

中的data.table
library(data.table)#1.9.7+
dcast(setDT(df1), user_id~paste0("pred_", rowid(user_id)), value.var = "pred")