带有dplyr :: group_by的新计数组列

时间:2018-04-16 17:51:58

标签: r dplyr

我尝试做的是选择每个独特的Mean(或CellLine,相同的想法)并创建一个名为AVGMOrder的新列,计算48个唯一的组。我不确定为什么我会收到这个错误,说组的大小是错误的。

感谢您的帮助!

> xist.df %>% group_by(Mean) %>% dplyr::mutate(AVGMOrder = seq(unique(Mean)))
# A tibble: 240 x 8
# Groups:   Mean [48]
   CpG        geneID CellLine                                         Meth OrigOrder Sex     Mean AVGMOrder
   <chr>      <chr>  <fct>                                           <dbl> <chr>     <chr>  <dbl>     <int>
 1 cg03554089 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.455 286339    Female 0.511         1
 2 cg12653510 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.491 286340    Female 0.511         1
 3 cg05533223 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.515 286341    Female 0.511         1
 4 cg11717280 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.489 286342    Female 0.511         1
 5 cg20698282 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.605 286343    Female 0.511         1
 6 cg03554089 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.491 376195    Female 0.519         1
 7 cg12653510 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.542 376196    Female 0.519         1
 8 cg05533223 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.483 376197    Female 0.519         1
 9 cg11717280 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.503 376198    Female 0.519         1
10 cg20698282 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.574 376199    Female 0.519         1
# ... with 230 more rows
> unique(xist.df$Mean)
 [1] 0.5110429 0.5185945 0.5299138 0.5319983 0.5333054 0.5465974 0.5484405 0.5518451 0.5631779 0.5647687 0.5736542
[12] 0.5741134 0.5803745 0.5839757 0.6864615 0.6990654 0.6994218 0.7478772 0.7986107 0.8016629 0.8204100 0.8239762
[23] 0.8281310 0.8311557 0.8375466 0.8405810 0.8460025 0.8513457 0.8514124 0.8583415 0.8587972 0.8596317 0.8597244
[34] 0.8632049 0.8642843 0.8656732 0.8661410 0.8679203 0.8707371 0.8710717 0.8816540 0.8823595 0.8827582 0.8852854
[45] 0.8856669 0.8900214 0.8903854 0.8915359
> xist.df %>% group_by(Mean) %>% dplyr::mutate(AVGMOrder = seq(unique(xist.df$Mean)))
Error in mutate_impl(.data, dots) : 
  Column `AVGMOrder` must be length 5 (the group size) or one, not 48

对评论的回应:

> xist.df %>% group_by(Mean) %>% dplyr::mutate(AVGMOrder = row_number())
# A tibble: 240 x 8
# Groups:   Mean [48]
   CpG        geneID CellLine                                         Meth OrigOrder Sex     Mean AVGMOrder
   <chr>      <chr>  <fct>                                           <dbl> <chr>     <chr>  <dbl>     <int>
 1 cg03554089 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.455 286339    Female 0.511         1
 2 cg12653510 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.491 286340    Female 0.511         2
 3 cg05533223 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.515 286341    Female 0.511         3
 4 cg11717280 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.489 286342    Female 0.511         4
 5 cg20698282 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.605 286343    Female 0.511         5
 6 cg03554089 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.491 376195    Female 0.519         1
 7 cg12653510 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.542 376196    Female 0.519         2
 8 cg05533223 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.483 376197    Female 0.519         3
 9 cg11717280 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.503 376198    Female 0.519         4
10 cg20698282 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.574 376199    Female 0.519         5
# ... with 230 more rows

编辑评论。我希望每个唯一的组都是单个值。 group_by在这里是错误的功能吗?

1 个答案:

答案 0 :(得分:1)

我决定使用以下解决方法:

xist.df <- xist.df %>% dplyr::arrange(Mean)
order <- as.data.frame(unique(xist.df$Mean))
order$AVGMOrder <- seq(rownames(order))
colnames(order) <- c("Mean", "AVGMOrder")
xist.df <- left_join(xist.df, order)
xist.df

输出如下:

> order <- as.data.frame(unique(xist.df$Mean))
> order$AVGMOrder <- seq(rownames(order))
> order
   unique(xist.df$Mean) AVGMOrder
1             0.5110429         1
2             0.5185945         2
3             0.5299138         3
4             0.5319983         4
5             0.5333054         5
6             0.5465974         6
7             0.5484405         7
8             0.5518451         8
9             0.5631779         9
10            0.5647687        10
> colnames(order) <- c("Mean", "AVGMOrder")
> test <- left_join(xist.df, order)
Joining, by = "Mean"
> test
# A tibble: 240 x 8
# Groups:   CellLine [?]
   CpG        geneID CellLine                                         Meth OrigOrder Sex     Mean AVGMOrder
   <chr>      <chr>  <fct>                                           <dbl> <chr>     <chr>  <dbl>     <int>
 1 cg03554089 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.455 286339    Female 0.511         1
 2 cg12653510 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.491 286340    Female 0.511         1
 3 cg05533223 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.515 286341    Female 0.511         1
 4 cg11717280 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.489 286342    Female 0.511         1
 5 cg20698282 XIST   iPS__HDF51IPS5_passage6_Female____156.440.1.1   0.605 286343    Female 0.511         1
 6 cg03554089 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.491 376195    Female 0.519         2
 7 cg12653510 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.542 376196    Female 0.519         2
 8 cg05533223 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.483 376197    Female 0.519         2
 9 cg11717280 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.503 376198    Female 0.519         2
10 cg20698282 XIST   iPS__HDF51IPS10_passage37_Female____161.900.1.2 0.574 376199    Female 0.519         2
# ... with 230 more rows