不知道怎么说这个,因为我知道的所有R编码都会根据和值删除整行,但这里是我想要做的一个例子。
我想从各个站点获取分类信息,但只保留整个样本中至少代表三次的水平。
例如,在下表中,尽管在River 15英里处,双翅目被确定为存在一次 - 双翅目整体在整个样本中出现38次,所以我想保留该行。对于Chaetocladius属也是如此,虽然它在RM0.7中出现一次在样品中出现5次,所以我会保留它。
此外,对于一个级别出现足够多次以保持不存在的情况,这些情况很少见,需要将其删除并替换为NA。例如,如RM15的订单Blattoidea或RM80,其中Chironomus atroviridis物种仅存在一次,但Insecta和Chironomus存在足够的时间以保持,因此我想保持这些水平但用NA取代其余的。
RM phylum class order family genus species Sum
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius Chaetocladius mel 1
15 Arthropoda Insecta Diptera NA NA NA 1
15 Arthropoda Insecta Blattoidea NA NA NA 1
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus atroviridis 2
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29
新输出看起来像这样 -
RM phylum class order family genus species Sum
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1
15 Arthropoda Insecta Diptera NA NA NA 1
15 Arthropoda Insecta NA NA NA NA 1
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2
80 Arthropoda Insecta Diptera Chironomidae Chironomus NA 2
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29
我已经汇总了这些分类单元的每个级别的列表,其值为3或更高,我想也许我可以通过每个级别(从Phylum到物种)工作,但无法弄清楚如何做到这一点。 / p>
请帮忙。
答案 0 :(得分:0)
可能有一种更简单的方法可以做到这一点,但这会提供您想要的输出。它包含在函数clean_data
中,您可以在其中指定必须保留的内容的次数。在这种情况下,在提供的数据中不存在两次的那些用NA替换。这符合您的需求吗?
dat <- read.table(header=T, text='
RM phylum class order family genus species Sum
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius "Chaetocladius mel" 1
15 Arthropoda Insecta Diptera NA NA NA 1
15 Arthropoda Insecta Blattoidea NA NA NA 1
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2
80 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus atroviridis" 2
80 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus bifurcatus" 1
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus bifurcatus" 29
')
clean_data <- function(dat, repeats){
# get the counts of each level within each column
counts <- sapply(dat[,colnames(dat) != c("RM", "Sum")], table)
# convert data to matrix for indexing
dat <- as.matrix(dat)
indices <- unlist(
# get indices of where the elements are in data matrix
lapply(
# remove list elements that are character(0)
Filter(length,
# find which levels are only present 'repeats' times
lapply(counts,FUN = function(x) names(which(x < repeats)))),
FUN = function(y) which(dat %in% y)))
# set indices to NA
dat[indices] <- NA
return(as.data.frame(dat))
}
clean_data(dat, 2)
> clean_data(dat, 2)
RM phylum class order family genus species Sum
1 0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius <NA> 1
2 15.0 Arthropoda Insecta Diptera <NA> <NA> <NA> 1
3 15.0 Arthropoda Insecta <NA> <NA> <NA> <NA> 1
4 0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius <NA> 1
5 54.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius <NA> 2
6 35.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius <NA> 2
7 80.0 Arthropoda Insecta Diptera Chironomidae Chironomus <NA> 2
8 80.0 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1
9 0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29