Question

我是这个R编程的新手，我在使用R编程语言获取Affy探测ID的基因名称和符号时遇到了问题。

探测符号名称
215535_s_at NA NA
32836_at NA NA
210678_s_at NA NA
32837_at NA NA
219723_x_at NA NA
223182_s_at NA NA 但是我无法从合并HGNC和David平面文件中获取细节。

请让我知道如何最好地解决这个问题。

我使用了以下代码

enter code here
probe <- read.delim("super.txt",stringsAsFactors=F, header = T, sep="\t")
probe$probeid<-tolower(probe$probeid)
names<-read.delim("GSE42568_probeid.txt", as.is=T, stringsAsFactors=F, header=T)
##insted of dataframa we are sending out the vecotr
names<-names$probeid

NoMatchID = NULL
vec<-NULL
system.time({
for (i in 1:11390){
  index<-grep(names[i],probe$probeid,fixed=T)
  #index<-grep(paste("^",names[i],"$"),probe$probeid,fixed=T)
  if (length(index)!=0) {
    cat("Index of", names[i],"is", index, "\n")
  } else {
    cat("Index of", names[i], "Found No Match \n")
    NoMatchID = c(NoMatchID,i)
  }
NoMatchID<-c(NoMatchID,index)
vec_NA <- data.frame(probe[-NoMatchID,])
}
})
NoMatchID <- data.frame(probe[NoMatchID,]) 

NoMatchID_probe = setdiff(1:nrow(probe), unique(vec))
write.table(vec_NA, file = "probeids_matched_1.txt", row.names = FALSE, append =     FALSE, col.names = TRUE, sep = "\t")

如果你们有任何其他办法可以解决这个问题，请告诉我:( ..这对我有很大帮助!!!

Answer 1

我不确定你明白了什么。如果您的基因名称和符号在探针数据框中

probe <- read.delim("super.txt",stringsAsFactors=F, header = T, sep="\t")
probe$probeid<-tolower(probe$probeid)
names<-read.delim("GSE42568_probeid.txt", as.is=T, stringsAsFactors=F, header=T)
##insted of dataframa we are sending out the vecotr
names<-names$probeid

并且您想要在probe中提取与names向量不匹配的行的名称。然后你应该修改你的代码如下：

#  NoMatchID = NULL
MatchID_probe <- NULL

for (i in 1:11390){
  index<-grep(names[i],probe$probeid,fixed=T)
  if (length(index)!=0) {
    cat("Index of", names[i],"is", index, "\n")
    MatchID_probe = c(MatchID_probe,index)
  } else {
    cat("Index of", names[i], "Found No Match \n")
    # NoMatchID = c(NoMatchID,i)
  }
}

NoMatchID_probe = setdiff(1:nrow(probe), unique(MatchID_probe))
DF_NoMatch <- probe[NoMatchID_probe,] 

DF_NoMatch

使用R程序获取Affy Prob Ids的基因名称和符号

1 个答案: