Question

我是R的新手，需要帮助排除我的转录组中的污染基因列表。

例如，

我有一个名为genes.txt的文件，其中包含一列基因名称以及一列相应的基因序列：

名称序列

Cluster1 TACGATCGATCGATCG .....

Cluster2 ATCGATCGATCGATCG .....

等...

我有另一个名为contam.txt的文件，它是需要从我的主基因列表中排除的基因名称列表：

名称

Cluster1中

Cluster5

等...

我需要消除基因文件中的整行，对应于contam文件中的簇。这是我正在尝试的代码：

#set working directory
setwd("C:/MyR/transcriptome")

#using data.table
library(data.table)

#load gene file
gene <- as.data.table(read.table("gene.txt",stringsAsFactors=FALSE, 
header=TRUE))

#set the key
setkey(unigene, Name)

#load contamination file
contam <- as.data.table(read.table("contam.txt",stringsAsFactors=FALSE, 
header=TRUE))

#remove contaminants from unigene file
unigene_new <- unigene[!unigene$Name %in% contam,]
#or
unigene_new[-unigene[contam, which=TRUE]]

我的代码不会从我的列表中删除不需要的基因....任何人都知道我做错了什么？

使用R data.table中txt文件中的值列表从data.frame中排除行

0 个答案: