R和GO.db:完成所有GO条款

时间:2013-05-10 10:43:11

标签: r bioconductor

这是一个非常具体的问题,但也许有人知道该怎么做。

我想要的是从诸如“BP”(包GO.db)之类的本体论中查看所有GO术语。我不一定希望以递归方式通过树,我对GO术语的评估顺序的唯一要求是对于给定的GO术语,在GO术语之前已经对它的所有子项进行了评估。

换句话说,我想构造GO项的字符向量V,例如如果G_x和G_y是两个GO项,并且G_x是G_y的父项,那么这些GO项的位置的索引i_x和i_y V是这样的i_x> I_Y。

2 个答案:

答案 0 :(得分:1)

我认为这(几乎)有效。这样做的诀窍是?unique保留了重复元素的第一个实例。

编辑:经过反思,这只是在向量的开头组织具有到根(即最多代)的最长路径的术语。我认为可能存在这样的情况:一个术语在两个分支上,一个具有较短的路径,其中该术语将被正确地放置在长路径上,但是对于较短的路径而言被放置在较早的路径上。也就是说,如果你对粗略的近似没问题......

# Root nodes for reference:
# BP = "GO:0008150"
# CC = "GO:0005575"
# MF = "GO:0003674"

GO_order <- function(node = "GO:0008150", ontology = "BP") {

    if (ontology == "BP") GOCHILDREN <- GOBPCHILDREN
    if (ontology == "CC") GOCHILDREN <- GOCCCHILDREN
    if (ontology == "MF") GOCHILDREN <- GOMFCHILDREN

    parents <- node

    # initialize output
    out <- c(parents)

    # do the following until there are no more parents
    while (any(!is.na(parents))) {  
        # Get the unique children of the parents (that aren't NA)
        children <- unique(unlist(mget(parents[!is.na(parents)], GOCHILDREN)))

        # append chldren to beginning of `out`
        # unique will keep the first instance of a duplicate 
        # (i.e. the most recent child is kept)
        out <- unique(append(children[!is.na(children)], out))

        # children become the parents of the next generation
        parents <- children
    }
    return(out)
}

答案 1 :(得分:0)

使用GO.db的内部功能

GO_child <- function(node = "GO:0008150", ontology = "BP") {
  #MF = "GO:0003674", node of MF
  #BP = "GO:0008150", node of BP
  #CC = "GO:0005575", node of CC
  if (ontology == "BP") res <- GOBPOFFSPRING[[node]]
  if (ontology == "CC") res <- GOCCOFFSPRING[[node]]
  if (ontology == "MF") res <- GOMFOFFSPRING[[node]]
  return(res)
}