通过dplyr聚合 - 将单个列从因子变为数字

时间:2018-05-25 15:36:15

标签: r dplyr aggregate-functions mutate

嗨,谢谢你的阅读。

我一直在尝试聚合一些数据并且已经通过聚合函数成功地完成了它,但我也想通过运行带有dplyr的管道尝试做同样的事情 - 但是我一直收到错误:

  

mutate_impl(.data,dots)出错:评估错误:无法   找到功能“15.2”。

我目前有这个数据集p:

    sample    gene           ct
1    s001     gapdh         15.2
2    s001     gapdh           16
3    s001     gapdh         14.8
4    s002     gapdh         16.2
5    s002     gapdh           17
6    s002     gapdh         16.7
7    s003     gapdh Undetermined
8    s003     gapdh         14.6
9    s003     gapdh           15
10   s001      actb         24.5
11   s001      actb         24.2 
12   s001      actb         24.7
13   s002      actb           25
14   s002      actb         25.7
15   s002      actb         25.5
16   s003      actb         27.3
17   s003      actb         27.4
18   s003      actb Undetermined

并希望它能够:

  p2$sample p2$gene  p2$ct.mean    p2$ct.sd
1      s001    actb 24.46666667  0.25166115
2      s002    actb 25.40000000  0.36055513
3      s003    actb 27.35000000  0.07071068
4      s001   gapdh 15.33333333  0.61101009
5      s002   gapdh 16.63333333  0.40414519
6      s003   gapdh 14.80000000  0.28284271

我正在使用的代码导致上述错误:

library(dplyr)

p_ave_sd <- p %>% 
  filter(p$ct != "Undetermined") %>%
  mutate_at(as.character(p$ct), as.numeric, rm.na = TRUE) %>%
  group_by(p$gene) %>% 
  summarise(mean=mean(p$ct), sd=sd(p$ct))

这绝对是让我失望的“变异”步骤,我尝试过mutate_all(),mutate_if(is.factor,is.numeric)等等,但每个都有自己的错误。

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

以下是使用mutate_at的方法。如果您只有一列要转换,mutate也会有效,而且更直接。

library(dplyr)

dat2 <- dat %>%
  filter(!ct %in% "Undetermined") %>%
  # mutate(ct = as.numeric(ct)) %>% <<< This will also work
  mutate_at(vars(ct), funs(as.numeric(.))) %>%
  group_by(sample, gene) %>% 
  summarise(mean = mean(ct), sd = sd(ct)) %>%
  ungroup()

dat2
# # A tibble: 6 x 4
#   sample gene   mean     sd
#   <chr>  <chr> <dbl>  <dbl>
# 1 s001   actb   24.5 0.252 
# 2 s001   gapdh  15.3 0.611 
# 3 s002   actb   25.4 0.361 
# 4 s002   gapdh  16.6 0.404 
# 5 s003   actb   27.4 0.0707
# 6 s003   gapdh  14.8 0.283 

数据

dat <- read.table(text = "    sample    gene           ct
1    s001     gapdh         15.2
                  2    s001     gapdh           16
                  3    s001     gapdh         14.8
                  4    s002     gapdh         16.2
                  5    s002     gapdh           17
                  6    s002     gapdh         16.7
                  7    s003     gapdh Undetermined
                  8    s003     gapdh         14.6
                  9    s003     gapdh           15
                  10   s001      actb         24.5
                  11   s001      actb         24.2 
                  12   s001      actb         24.7
                  13   s002      actb           25
                  14   s002      actb         25.7
                  15   s002      actb         25.5
                  16   s003      actb         27.3
                  17   s003      actb         27.4
                  18   s003      actb Undetermined",
                  header = TRUE, stringsAsFactors = FALSE)

答案 1 :(得分:0)

我不确定我是否理解你的问题,但可能性是:

p_ave_sd <- p %>% 
   filter(ct != "undetermined") %>%
   mutate(ct=as.numeric(ct)) %>%
   group_by(gene,sample) %>% 
   summarise(mean=mean(ct), sd=sd(ct))