保持行中的列等于或大于给定值

时间:2014-10-02 19:24:00

标签: r sorting unique rows

不知道怎么说这个,因为我知道的所有R编码都会根据和值删除整行,但这里是我想要做的一个例子。

我想从各个站点获取分类信息,但只保留整个样本中至少代表三次的水平。

例如,在下表中,尽管在River 15英里处,双翅目被确定为存在一次 - 双翅目整体在整个样本中出现38次,所以我想保留该行。对于Chaetocladius属也是如此,虽然它在RM0.7中出现一次在样品中出现5次,所以我会保留它。

此外,对于一个级别出现足够多次以保持不存在的情况,这些情况很少见,需要将其删除并替换为NA。例如,如RM15的订单Blattoidea或RM80,其中Chironomus atroviridis物种仅存在一次,但Insecta和Chironomus存在足够的时间以保持,因此我想保持这些水平但用NA取代其余的。

RM  phylum      class   order   family          genus           species             Sum
0.5 Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   Chaetocladius mel   1
15  Arthropoda  Insecta Diptera NA              NA              NA  1
15  Arthropoda  Insecta Blattoidea   NA         NA              NA  1
0.7 Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  1
54  Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  2
35  Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  2
80  Arthropoda  Insecta Diptera Chironomidae    Chironomus  Chironomus atroviridis  2
80  Arthropoda  Insecta Diptera Chironomidae    Chironomus  Chironomus bifurcatus   1
0.5 Arthropoda  Insecta Diptera Chironomidae    Chironomus  Chironomus bifurcatus   29

新输出看起来像这样 -

RM  phylum  class   order   family  genus   species Sum
0.5 Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  1
15  Arthropoda  Insecta Diptera NA              NA              NA  1
15  Arthropoda  Insecta NA      NA              NA              NA  1
0.7 Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  1
54  Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  2
35  Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  2
80  Arthropoda  Insecta Diptera Chironomidae    Chironomus      NA  2
80  Arthropoda  Insecta Diptera Chironomidae    Chironomus  Chironomus bifurcatus   1
0.5 Arthropoda  Insecta Diptera Chironomidae    Chironomus  Chironomus bifurcatus   29

我已经汇总了这些分类单元的每个级别的列表,其值为3或更高,我想也许我可以通过每个级别(从Phylum到物种)工作,但无法弄清楚如何做到这一点。 / p>

请帮忙。

1 个答案:

答案 0 :(得分:0)

可能有一种更简单的方法可以做到这一点,但这会提供您想要的输出。它包含在函数clean_data中,您可以在其中指定必须保留的内容的次数。在这种情况下,在提供的数据中不存在两次的那些用NA替换。这符合您的需求吗?

dat <- read.table(header=T, text='
RM  phylum      class   order   family          genus           species             Sum
0.5 Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   "Chaetocladius mel"   1
                  15  Arthropoda  Insecta Diptera NA              NA              NA  1
                  15  Arthropoda  Insecta Blattoidea   NA         NA              NA  1
                  0.7 Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  1
                  54  Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  2
                  35  Arthropoda  Insecta Diptera Chironomidae    Chaetocladius   NA  2
                  80  Arthropoda  Insecta Diptera Chironomidae    Chironomus  "Chironomus atroviridis"  2
                  80  Arthropoda  Insecta Diptera Chironomidae    Chironomus  "Chironomus bifurcatus"   1
                  0.5 Arthropoda  Insecta Diptera Chironomidae    Chironomus  "Chironomus bifurcatus"   29
                  ')

clean_data <- function(dat, repeats){
  # get the counts of each level within each column
  counts <- sapply(dat[,colnames(dat) != c("RM", "Sum")], table)

  # convert data to matrix for indexing
  dat <- as.matrix(dat)

  indices <- unlist(
    # get indices of where the elements are in data matrix
    lapply(
      # remove list elements that are character(0)
      Filter(length,
                  # find which levels are only present 'repeats' times
                  lapply(counts,FUN = function(x) names(which(x < repeats)))),
      FUN = function(y) which(dat %in% y)))

  # set indices to NA
  dat[indices] <- NA
  return(as.data.frame(dat))
}

clean_data(dat, 2)

> clean_data(dat, 2)
    RM     phylum   class   order       family         genus               species Sum
1  0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius                  <NA>   1
2 15.0 Arthropoda Insecta Diptera         <NA>          <NA>                  <NA>   1
3 15.0 Arthropoda Insecta    <NA>         <NA>          <NA>                  <NA>   1
4  0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius                  <NA>   1
5 54.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius                  <NA>   2
6 35.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius                  <NA>   2
7 80.0 Arthropoda Insecta Diptera Chironomidae    Chironomus                  <NA>   2
8 80.0 Arthropoda Insecta Diptera Chironomidae    Chironomus Chironomus bifurcatus   1
9  0.5 Arthropoda Insecta Diptera Chironomidae    Chironomus Chironomus bifurcatus  29