将因子转换为数字已多次处理,但我的问题是当因子中有多个数字时。例如,这是我的data.frame的一小部分:
AF AC AN EAS_AF AMR_AF
1 0.000199681 1 5008 0.001 0.0
2 0.00319489 16 5008 0.0 0.0
3 0.024361, 0.00479233 122, 24 5008 0.0, 0.0 0.0043, 0.0014
4 0.00439297 22 5008 0.0 0.0014
5 0.000798722 4 5008 0.0 0.0
通常情况下,我会结合使用as.numeric
和levels
函数将这些因素转换为数字。但是,第三行在每个条目中都有两个数字,因此在对这些变量尝试此方法时,我得到一个NA。有没有办法绕过这个?我有太多这样的情况要手动拔出它们。
我的总体目标是测试每个列中的每个条目是否大于0(所以如果有两个数字,我会测试两者),这就是我试图首先转换为数字的原因。如果还有其他更聪明的方法可以解决这个问题,我愿意尝试一下。
根据要求,下面是我的数据框缩小版本的dput
(仅占前10行)。
structure(list(CHROM = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L), POS = c(180109L, 209892L, 221335L, 239445L, 246927L, 246928L,
246933L, 246955L, 246970L), ID = structure(c(6L, 4L, 1L, 3L,
5L, 9L, 2L, 7L, 8L), .Label = c("rs143013573", "rs1431845", "rs145483680",
"rs151111729", "rs547339499", "rs547699134", "rs556577288", "rs575589407",
"rs72770983"), class = "factor"), REF = structure(c(3L, 2L, 2L,
3L, 1L, 1L, 3L, 2L, 1L), .Label = c("A", "C", "G"), class = "factor"),
ALT = structure(c(1L, 2L, 3L, 1L, 2L, 2L, 1L, 4L, 2L), .Label = c("A",
"G", "G, T", "T"), class = "factor"), AF = structure(c(1L,
5L, 7L, 6L, 2L, 4L, 8L, 3L, 1L), .Label = c("0.000199681",
"0.000798722", "0.000998403", "0.00239617", "0.00319489",
"0.00439297", "0.024361, 0.00479233", "0.220248"), class = "factor"),
AC = structure(c(1L, 5L, 4L, 6L, 7L, 3L, 2L, 8L, 1L), .Label = c("1",
"1103", "12", "122, 24", "16", "22", "4", "5"), class = "factor"),
AN = c(5008L, 5008L, 5008L, 5008L, 5008L, 5008L, 5008L, 5008L,
5008L), EAS_AF = structure(c(3L, 1L, 2L, 1L, 1L, 3L, 4L,
1L, 1L), .Label = c("0.0", "0.0, 0.0", "0.001", "0.248"), class = "factor"),
AMR_AF = structure(c(1L, 1L, 3L, 2L, 1L, 2L, 4L, 1L, 2L), .Label = c("0.0",
"0.0014", "0.0043, 0.0014", "0.1599"), class = "factor"),
AFR_AF = structure(c(1L, 3L, 5L, 4L, 2L, 1L, 6L, 1L, 1L), .Label = c("0.0",
"0.003", "0.0121", "0.0159", "0.09, 0.0", "0.1611"), class = "factor"),
EUR_AF = structure(c(1L, 1L, 2L, 1L, 1L, 3L, 4L, 1L, 1L), .Label = c("0.0",
"0.0, 0.0089", "0.0089", "0.2495"), class = "factor"), SAS_AF = structure(c(1L,
1L, 2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("0.0", "0.0, 0.0143",
"0.001", "0.0051", "0.2843"), class = "factor"), consequence = structure(c(2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("stop_gained",
"synonymous_variant"), class = "factor"), gene = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ZMYND11", class = "factor"),
accession = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "NM_006624.5", class = "factor"), gene_type = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "protein_coding", class = "factor")), .Names = c("CHROM",
"POS", "ID", "REF", "ALT", "AF", "AC", "AN", "EAS_AF", "AMR_AF",
"AFR_AF", "EUR_AF", "SAS_AF", "consequence", "gene", "accession",
"gene_type"), class = "data.frame", row.names = c(NA, -9L))
答案 0 :(得分:1)
以下是来自separate_rows
的{{1}}如何做到这一点:
tidyr