使用summarise_all折叠多个字符列会引发错误或包含NA

时间:2018-05-15 12:05:17

标签: r

我有一个这样的数据框:

    structure(list(ref = c("1_S126_L006", "1_S126_L006", "1_S126_L006", 
"1_S126_L006", "1_S126_L006", "1_S126_L006", "1_S126_L006", "1_S126_L006", 
"1_S126_L006", "1_S126_L006", "1_S126_L006", "1_S126_L006", "150_S96_L005", 
"150_S96_L005", "150_S96_L005", "150_S96_L005", "150_S96_L005", 
"150_S96_L005", "150_S96_L005", "150_S96_L005"), Escherichia_coli_CyaA_1 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, "N142S", "G222S", NA, NA, 
NA, NA, NA, NA, NA, NA), Escherichia_coli_EF_Tu = c(".", ".", 
".", NA, NA, NA, NA, NA, NA, NA, NA, NA, ".", NA, NA, NA, NA, 
NA, NA, NA), Escherichia_coli_GlpT = c(NA, NA, NA, NA, NA, NA, 
NA, "E448K", NA, NA, NA, NA, NA, NA, NA, NA, NA, "E448K", NA, 
NA), Escherichia_coli_PtsI = c(NA, NA, NA, NA, NA, NA, NA, NA, 
"R367K", NA, NA, NA, NA, NA, NA, NA, NA, NA, "R367K", NA), Escherichia_coli_UhpT = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    fabG = c(NA, NA, NA, NA, NA, "D105E", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, "D105E", NA, NA, NA, NA), gyrA_8 = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, "S83L", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, "S83L"), gyrB_1 = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_), marR = c(NA, NA, NA, "G103S", 
    "Y137H", NA, NA, NA, NA, NA, NA, NA, NA, "G103S", "Y137H", 
    NA, NA, NA, NA, NA), nfsA = c(NA, NA, NA, NA, NA, NA, "Y45C", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, "Y45C", NA, NA, NA), 
    ompF = c(NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), parC_3 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_)), .Names = c("ref", "Escherichia_coli_CyaA_1", 
"Escherichia_coli_EF_Tu", "Escherichia_coli_GlpT", "Escherichia_coli_PtsI", 
"Escherichia_coli_UhpT", "fabG", "gyrA_8", "gyrB_1", "marR", 
"nfsA", "ompF", "parC_3"), row.names = c(NA, -20L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "ref", drop = TRUE, indices = list(
    0:11, 12:19), group_sizes = c(12L, 8L), biggest_group_size = 12L, labels = structure(list(
    ref = c("1_S126_L006", "150_S96_L005")), row.names = c(NA, 
-2L), class = "data.frame", vars = "ref", drop = TRUE, .Names = "ref"))

我想要做的是折叠整个数据框,这样我在“ref”列中每个条目只能获得一行。如果同一列中存在多个值,则应将它们粘贴在一起并在同一单元格中用“,”分隔。 我之前使用以下内容将整个数据框折叠为“ref”列中每个条目的一行:

library(dplyr)

func_paste <- function(x) paste(unique(sum(x, na.rm = T)), collapse = ",")

df %>%
group_by(ref) %>%
summarise_all(funs(func_paste))

这适用于其他一些数据集,但我不能为我的生活弄清楚为什么我仍然会得到错误:

Error in summarise_impl(.data, dots) : 
Evaluation error: invalid 'type' (character) of argument.

我已经阅读了一些关于此错误的帖子,例如herehere,他们建议尝试group_by(x) %>% summarise_each(funs(sum)),但这仅适用于数字数据而非字符数据。据我所知,它与sum()函数有关,因为它是字符数据。有什么建议吗?

修改

如果我在没有sum()函数的情况下运行它,它似乎就可以了。但是,如果没有na.rm = T部分,它现在会使用值粘贴所有NA。如何让它忽略这一点?

0 个答案:

没有答案