从R中的数据库中删除模式

时间:2014-10-31 21:47:58

标签: r dataset gsub

这是我的样本数据集:

 > head(d3)
V1                  V2                       V3                     V4                      V5                     V6
2 Bacteria(100) Proteobacteria(100) Gammaproteobacteria(100)   Pseudomonadales(100)           Pseudomonadaceae(100)    Pseudomonas(98)
3 Bacteria(100)  Bacteroidetes(100)          Bacteroidia(93)      Bacteroidales(93)        unclassified(93)   unclassified(93)
4 Bacteria(100)     Firmicutes(100)             Bacilli(100)   Lactobacillales(100)   Streptococcaceae(100) Streptococcus(100)
5 Bacteria(100) Proteobacteria(100) Gammaproteobacteria(100)    Pasteurellales(100)    Pasteurellaceae(100)   unclassified(68)
6 Bacteria(100) Proteobacteria(100) Gammaproteobacteria(100) Enterobacteriales(100) Enterobacteriaceae(100)   unclassified(90)
7 Bacteria(100)  Bacteroidetes(100)         Bacteroidia(100)     Bacteroidales(100) Porphyromonadaceae(100)  unclassified(100)

我试图从每个字符串中删除(100)。 我试过了:

>d3 <- gsub("[(0-9)]", "", d3)

这给了我一堆乱七八糟的数据集,这些数据集似乎是我试图在底部c()内删除的所有数字。所以我试过这个:

>for(j in 1:nrow(d3)){

    for(i in 1:ncol(d3)){
       d3[j,i] <- gsub("[(0-9)]", "", as.character(d3[j,i]))
       }
    }

这给了我&#34;无效因子水平,NA生成&#34;还有一堆数据集,其中大部分都被NA取代!我无法找到任何我想要的问题。

1 个答案:

答案 0 :(得分:2)

这是一种方式:

d3[] <- sapply(d3,function(x){
  gsub("\\(\\d+\\)","",as.character(x))
})
##
> d3
        V1             V2                  V3                V4                 V5            V6
2 Bacteria Proteobacteria Gammaproteobacteria   Pseudomonadales   Pseudomonadaceae   Pseudomonas
3 Bacteria  Bacteroidetes         Bacteroidia     Bacteroidales       unclassified  unclassified
4 Bacteria     Firmicutes             Bacilli   Lactobacillales   Streptococcaceae Streptococcus
5 Bacteria Proteobacteria Gammaproteobacteria    Pasteurellales    Pasteurellaceae  unclassified
6 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae  unclassified
7 Bacteria  Bacteroidetes         Bacteroidia     Bacteroidales Porphyromonadaceae  unclassified

当你说“从每个字符串中删除(100)”时,我假设你的意思是(93)等等,但是如果你只是想要删除了(100),您可以使用:

d3[] <- sapply(d3,function(x){
  gsub("\\(100\\)","",as.character(x))
})
##
> d3
        V1             V2                  V3                V4                 V5               V6
2 Bacteria Proteobacteria Gammaproteobacteria   Pseudomonadales   Pseudomonadaceae  Pseudomonas(98)
3 Bacteria  Bacteroidetes     Bacteroidia(93) Bacteroidales(93)   unclassified(93) unclassified(93)
4 Bacteria     Firmicutes             Bacilli   Lactobacillales   Streptococcaceae    Streptococcus
5 Bacteria Proteobacteria Gammaproteobacteria    Pasteurellales    Pasteurellaceae unclassified(68)
6 Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae unclassified(90)
7 Bacteria  Bacteroidetes         Bacteroidia     Bacteroidales Porphyromonadaceae     unclassified