查找顶级父级,Tree Traversal:在R

时间:2017-01-12 10:12:42

标签: r regex match tree-traversal stringr

请回顾下面的情况,因为它很复杂,我尽力说明它,但可能存在问题。所以请问我问题,我很快回复。

这是方案

获取层次结构中最活跃的父级

  1. 从Child列迭代,例如。子0190对应对应列表列0197_0195_0192_0190(第6行)
  2. 然后检查子列中的前一个父级0192(
  3. 然后检查Prev site 0195,它也出现在Child Column
  4. 然后检查Prev site 0197(Child Column

  5. 中不存在)
  6. 所以它应该在新列“Active Parent”中给出最后一个匹配的Child 0195作为“输出”。

  7. 应该对Child中的所有项目执行此操作并找到最高活动

    Parent_Hierarchy< - C( “0077_8239_0218”, “0077_72597”, “0159_0162_0232”, “0006_0042_72561”, “0077_0090_0125”, “0077_8239_0218_0184”, “0197_0195_0192”, “0197_2031_2414”, “0159_2384”, “0197_2247_2248_72769”,“0197_0195_0192_0190 “,”0197_2247_2248“)

    Child< -c(“0218”,“72597”,“0232”,“72561”,“0125”,“0184”,“0195”,“2414”,“2384”,“72769”,“0190 “,”2248“)

    位置< - c(3,2,3,3,3,4,3,3,2,4,5,3)

    Tree< - data.frame(Parent_Hierarchy,Child,Position) Tree $ list< -strsplit(as.character(Tree $ Parent_Hierarchy),split =“[_]”)

  8. 下面我能够获得prev1列中的顶级父级,但我希望它在编程上像循环一样。

    #for getting prev1
    
    Tree$prev1<-0
    for (aa in 1:NROW(Tree$Position)) {
      ifelse(((Tree$Position[[aa]]>2)&&(Tree$list[[aa]]   [Tree$Position[[aa]]-2]) %in% Tree$Child),
         Tree$prev1[aa]<- as.character(Tree$list[[aa]][(Tree$Position[aa]-2)]),
         Tree$prev1[aa]<- as.character(Tree$Child[[aa]]))
    }
    
    
    #for getting prev2
    Tree$prev2<-0
    
    for (aa in 1:NROW(Tree$Position)) {
    ifelse(((Tree$Position[[aa]]>3)&&(Tree$list[[aa]][Tree$Position[[aa]]-3]) %in% Tree$Child),
         Tree$prev2[aa]<- as.character(Tree$list[[aa]][(Tree$Position[aa]-3)]),
         Tree$prev2[aa]<- as.character(Tree$prev1[[aa]]))
    }
    

    我尝试了这个解决方案但是它的循环很长时间

    for (aa in 1:NROW(poly_IDA$label2)) {
    for (ii in poly_IDA$label2[[aa]]) {
    while((poly_IDA$list[[aa]][(poly_IDA$label2[aa]-ii+1)] %in% poly_IDA$FromSite)){
      ifelse((poly_IDA$label2[[aa]]>ii),poly_IDA$prev3[aa]<-     poly_IDA$list[[aa]][(poly_IDA$label2[aa]-ii)],
             poly_IDA$prev3[aa]<- poly_IDA$list[[aa]][(poly_IDA$label2[aa]-ii+1)])
        }     
      }
    }
    
    如果您有任何疑问,请告诉我

2 个答案:

答案 0 :(得分:1)

问题并不复杂,但工作和解释很复杂。我试图最好地推断出意义。我希望我没错。

我正在附加一个解决方案。我希望它有所帮助。您可能想尝试下面附加的代码。我希望这就是你要找的东西。

    Parent_Hierarchy<- c("0077_8239_0218", "0077_72597","0159_0162_0232", "0006_0042_72561", "0077_0090_0125", "0077_8239_0218_0184", "0197_0195_0192", "0197_2031_2414", "0159_2384", "0197_2247_2248_72769", "0197_0195_0192_0190", "0197_2247_2248")
Tree<- data.frame(Parent_Hierarchy)
Tree$list<-strsplit(as.character(Parent_Hierarchy), split="[_]")
Tree$length <- lengths(Tree$list)
#Extracting the current child in the Parent Hierarchy
for (i in 1:NROW(Tree$list)){
  Tree$Child[i]<-Tree$list[[i]][Tree$length[i]]
}
#Extracting last Active Parent from Parent Hierarchy
for (i in 1:NROW(Tree$list)){
    print(i)
    for (j in seq(Tree$length[i])){
      print(c(j,Tree$length[i]-j+1))
      print(Tree$list[[i]][Tree$length[i]-j])
    ifelse((Tree$list[[i]][Tree$length[i]-j+1] %in% Tree$Child),Tree$activeP[i]<- Tree$list[[i]][Tree$length[i]-j],Tree$TopP[i]<- Tree$list[[i]][Tree$length[i]-j+1])
  }
}

答案 1 :(得分:1)

这里的解决方案虽然不是最好的R方式,但可以很好地完成工作

#building the DF
Parent_Hierarchy<- c("0077_8239_0218", "0077_72597","0159_0162_0232", "0006_0042_72561", "0077_0090_0125", "0077_8239_0218_0184", "0197_0195_0192", "0197_2031_2414", "0159_2384", "0197_2247_2248_72769", "0197_0195_0192_0190", "0197_2247_2248")
Child<-c("0218", "72597", "0232", "72561", "0125", "0184", "0195", "2414", "2384", "72769", "0190", "2248")
Position <- as.numeric(mapply(grep,Tree$Child,Tree$list))
Tree<- data.frame(Parent_Hierarchy,Child, Position) 
Tree$list<-strsplit(as.character(Tree$Parent_Hierarchy), split="[_]")
Tree$activeP<-NA

#looping for extracting active parent
for (aa in 1:NROW(Tree$Position)) {
  ii=Tree$Position[[aa]]
  while(Tree$list[[aa]][ii] %in% Tree$Child){
    Tree$activeP[[aa]]<- Tree$list[[aa]][ii]
    ii=ii-1
  }
}