topTable函数中的微阵列Limma包不为probsets列指定ID

时间:2014-04-09 18:05:12

标签: r bioconductor

我尝试了Daniel Swan 的教程,效果非常好。但我在 limma package topTable 功能中遇到问题。

" topTable"函数创建一个"探针集列表" 但是此概率列表没有" ID" 标题(其他列名称是他们的样本名称,但探测列表列没有名称(ID))。

结果,当我跑步时:

gene.symbols <- getSYMBOL(probeset.list$ID, "hgu133plus2")

我收到以下错误

  Error in .select(x, keys, columns, keytype = extraArgs[["kt"]], jointype = jointype): 
      'keys' must be a character vector

topTable是:

               logFC  AveExpr        t      P.Value    adj.P.Val        B
204779_s_at 7.367790 4.171707 72.77347 3.284937e-15 8.969850e-11 20.25762
207016_s_at 6.936667 4.027733 57.39252 3.694641e-14 5.044293e-10 19.44987
209631_s_at 5.192949 4.003992 51.24892 1.170273e-13 1.065182e-09 18.96660

我的表达式由simpleaffy(gcrma)包实现。 我在Windows 7下使用最新的bioconductor软件包,simpleaffy_2.38.0,limma_3.18.13和anotation文件运行R 3.0.2:hgu133plus2.db_2.10.1,hgu133plus2probe_2.13.0,hgu133plus2cdf_2.13.0

如果有人能帮助我,我会非常感激。

2 个答案:

答案 0 :(得分:1)

ID不会存储为ID列,而是存储为表的rownames。将行更改为:

gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")

如果您希望有一个ID列而不是使用行名,您可以指定一个:

probeset.list$ID = rownames(probeset.list)

根据toptable函数的文档,当且仅当存在重复的基因名称时,ID列才会存在:

 If ‘fit’ had unique rownames, then the row.names of the above
 data.frame are the same in sorted order. Otherwise, the row.names
 of the data.frame indicate the row number in ‘fit’. If ‘fit’ had
 duplicated row names, then these are preserved in the ‘ID’ column
 of the data.frame, or in ‘ID0’ if ‘genelist’ already contained an
 ‘ID’ column.

在您使用过ID的其他示例中,输入中必定存在重复的基因名称。这是有道理的,因为R通常不喜欢具有重复的rownames(但是在列中具有重复的ID没有问题)。

答案 1 :(得分:1)

希望我的工作代码可以使你的问题清楚:

library(limma) # загружаем нужную библиотека
library(siggenes)
library(cluster)
library(stats)

data <- read.table("AneurismDataAllProbesGenesisLog2NormalizedExperAndGenes.tab", sep = "\t", header = TRUE) # read from file

q = as.matrix(data) # данные в матрицу

b = as.matrix(cbind(data[, 2:10], data[, 11:14])) # cмежные колонки данных
m = normalizeQuantiles(b, ties=TRUE)
f = data.frame(condition = c(0,0,0,0,0,0,0,0,0,1,1,1,1)) # дизайн
fit = lmFit(m, f) # линейная модель
e = eBayes(fit) # тест Байеса
volcanoplot(e, coef=1, highlight=5, names=data$GeneName, xlab="Log Fold Change", ylab="Log Odds", pch=19, cex=0.67, col = "dark blue") # график-вулкан
z = rownames(m) = data[, 1]
hc <- hclust(dist(m), "ave") # кластерграмма
plot(hc)
plot(hc, hang = -1)  

print(e$coefficients) # output eBayes coefficients
print(e$p.value) # get out the P values
toptable(e) # select 10 most differentialy expressed genes, the disadvantage that it outputs only the gene row number and not the name
printresult <-toptable(e) # assign the result to a variable
write.csv(printresult, file = "eBayesTableAneurism", row.names = TRUE) # write to the file in the current folder
 volcanoplot(e, coef=1, highlight=10, names=data[,1], xlab="Log Fold Change", ylab="Log Odds", pch=19, cex=0.67, col = "red") # график-вулкан c именами
volcanoplot(e, coef=1, highlight=5, names=data[,1], xlab="Log Fold Change", ylab="Log Odds", pch=19, cex=0.67, col = "blue") # график-вулкан с именами (Volcano with gene names)