我有两个文件。一个文件包含有关基因的信息。它看起来像这样。
Query| Gene | Desc
APECO1_1380 | fldA | flavodoxin FldA
APECO1_2545 | fpr |ferredoxin-NADP reductase
APECO1_3632 |fldB | flavodoxin FldB
APECO1_1465 |fepA | ferrienterobactin receptor
APECO1_4396 | cirA | colicin I receptor
第二个文件包含GeneCodes列表。
APECO1_1380
APECO1_2545
APECO1_3632
我试图从文件1中提取文件2中Gene代码的Gene信息。下面是我正在使用的代码。
#Files with gene data for GeneCodes(File 1)
dataT = read.csv("D://SBMLexploration/Genes/genenames.csv",header = TRUE)
#Has the second type of files (files with the GeneCode) - File 2
fileList = list.files("D://SBMLexploration/Genes/Test1")
df = data.frame(MonkCode = character(), GeneName = character(),
Description = character(), stringsAsFactors = F)
for(i in 1:length(fileList))
{
currentGenes = read.csv(fileList[i],header = T)
for(j in 1:nrow(currentGenes))
{
curentRow = subset(dataT,dataT$Query == currentGenes[j,1])
df<-rbind(df,data.frame(MonkCode = currentRow$Query,
GeneName = currentRow$Gene,
Description = currentRow$Desc))
}
write.table(df,fileName,sep=",",row.names = F)
df = NULL
}
我的问题是当我提供GenCode为currentGenes[j,1]
时,查询返回0行。但是当我将代码作为字符串(在APECO1_1465
中)时,它返回记录。问题在于我指的是列表。有人可以帮帮我吗?
答案 0 :(得分:2)
Simply transform currentGenes[j,1]
into a string using as.character()
.
I.e.as.character(currentGenes[j,1])
答案 1 :(得分:1)
R的语法不是它的强项,可能导致许多令人沮丧的错误,就像你描述的那样。冒着启动firewar的风险,让我建议dplyr
库并在下面展示一个替代的,基于dplyr的解决方案。
library(dplyr)
#load your reference data and register your gene files
dataT = read.csv("D://SBMLexploration/Genes/genenames.csv",header = TRUE)
fileList = list.files("D://SBMLexploration/Genes/Test1")
# load genes from a file and output refencedata
processdata <- function(ref_df, filename){
genes = read.csv(filename,header = T, col.names = c("genes"))
refdf %>%
filter(Query %in% genes$genes) %>%
mutate(MonkCode = Query,
GeneName = Gene,
Description = Desc) %>%
select(MonkCode,GeneName,Description) %>%
write.table(file=paste0(filename,"_hits.txt"))
}
#apply your function to each file
Map(processdata,fileList)