将由多个数字组成的因子转换为数字

时间:2017-06-28 13:23:07

标签: r dataframe

将因子转换为数字已多次处理,但我的问题是当因子中有多个数字时。例如,这是我的data.frame的一小部分:

                    AF      AC   AN   EAS_AF         AMR_AF
1          0.000199681       1 5008    0.001            0.0
2           0.00319489      16 5008      0.0            0.0
3 0.024361, 0.00479233 122, 24 5008 0.0, 0.0 0.0043, 0.0014
4           0.00439297      22 5008      0.0         0.0014
5          0.000798722       4 5008      0.0            0.0

通常情况下,我会结合使用as.numericlevels函数将这些因素转换为数字。但是,第三行在每个条目中都有两个数字,因此在对这些变量尝试此方法时,我得到一个NA。有没有办法绕过这个?我有太多这样的情况要手动拔出它们。

我的总体目标是测试每个列中的每个条目是否大于0(所以如果有两个数字,我会测试两者),这就是我试图首先转换为数字的原因。如果还有其他更聪明的方法可以解决这个问题,我愿意尝试一下。

根据要求,下面是我的数据框缩小版本的dput(仅占前10行)。

structure(list(CHROM = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L), POS = c(180109L, 209892L, 221335L, 239445L, 246927L, 246928L, 
246933L, 246955L, 246970L), ID = structure(c(6L, 4L, 1L, 3L, 
5L, 9L, 2L, 7L, 8L), .Label = c("rs143013573", "rs1431845", "rs145483680", 
"rs151111729", "rs547339499", "rs547699134", "rs556577288", "rs575589407", 
"rs72770983"), class = "factor"), REF = structure(c(3L, 2L, 2L, 
3L, 1L, 1L, 3L, 2L, 1L), .Label = c("A", "C", "G"), class = "factor"), 
    ALT = structure(c(1L, 2L, 3L, 1L, 2L, 2L, 1L, 4L, 2L), .Label = c("A", 
    "G", "G, T", "T"), class = "factor"), AF = structure(c(1L, 
    5L, 7L, 6L, 2L, 4L, 8L, 3L, 1L), .Label = c("0.000199681", 
    "0.000798722", "0.000998403", "0.00239617", "0.00319489", 
    "0.00439297", "0.024361, 0.00479233", "0.220248"), class = "factor"), 
    AC = structure(c(1L, 5L, 4L, 6L, 7L, 3L, 2L, 8L, 1L), .Label = c("1", 
    "1103", "12", "122, 24", "16", "22", "4", "5"), class = "factor"), 
    AN = c(5008L, 5008L, 5008L, 5008L, 5008L, 5008L, 5008L, 5008L, 
    5008L), EAS_AF = structure(c(3L, 1L, 2L, 1L, 1L, 3L, 4L, 
    1L, 1L), .Label = c("0.0", "0.0, 0.0", "0.001", "0.248"), class = "factor"), 
    AMR_AF = structure(c(1L, 1L, 3L, 2L, 1L, 2L, 4L, 1L, 2L), .Label = c("0.0", 
    "0.0014", "0.0043, 0.0014", "0.1599"), class = "factor"), 
    AFR_AF = structure(c(1L, 3L, 5L, 4L, 2L, 1L, 6L, 1L, 1L), .Label = c("0.0", 
    "0.003", "0.0121", "0.0159", "0.09, 0.0", "0.1611"), class = "factor"), 
    EUR_AF = structure(c(1L, 1L, 2L, 1L, 1L, 3L, 4L, 1L, 1L), .Label = c("0.0", 
    "0.0, 0.0089", "0.0089", "0.2495"), class = "factor"), SAS_AF = structure(c(1L, 
    1L, 2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("0.0", "0.0, 0.0143", 
    "0.001", "0.0051", "0.2843"), class = "factor"), consequence = structure(c(2L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("stop_gained", 
    "synonymous_variant"), class = "factor"), gene = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ZMYND11", class = "factor"), 
    accession = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = "NM_006624.5", class = "factor"), gene_type = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "protein_coding", class = "factor")), .Names = c("CHROM", 
"POS", "ID", "REF", "ALT", "AF", "AC", "AN", "EAS_AF", "AMR_AF", 
"AFR_AF", "EUR_AF", "SAS_AF", "consequence", "gene", "accession", 
"gene_type"), class = "data.frame", row.names = c(NA, -9L)) 

1 个答案:

答案 0 :(得分:1)

以下是来自separate_rows的{​​{1}}如何做到这一点:

tidyr