在具有多个值的xml-tei属性中选择属性值

时间:2017-04-05 16:39:23

标签: r xml attr tei

我想在属性中选择几个属性值,其中多个值为print。对于这个例子

#in R
interpRef <- getNodeSet(doc,"//ns:ref[contains(@ana, 'whatAction')]", ns) 
interpRef_ana <- for (i in 1:length(interpRef)) print(paste(xmlGetAttr(interpRef[[i]],"ana")))

我有结果:

[[1]]
<ref ana="whatAction #ktu1-3_ii_l6b_tḫtṣb #verb.competition #contend">Action belongs to verb competition subcategory contend
                                    <stage ana="whatResult #result #defeate_ofOpposition"/></ref> 
[[2]]
<ref ana="whatAction #ktu1-3_ii_l7_tmḫṣ #verb.emotion #humiliation">Action belongs to verb emotion, subcategory humiliation
                                    <stage ana="whatResult #result #defeate_ofOpposition"/></ref> 
[[3]]
<ref ana="whatAction #ktu1-3_ii_l8_tṣmt #verb.emotion #humiliation">Action belongs to verb emotion, subcategory humiliation</ref>  

#print
[1] "whatAction #ktu1-3_ii_l6b_tḫtṣb #verb.competition #contend"
[1] "whatAction #ktu1-3_ii_l7_tmḫṣ #verb.emotion #humiliation"
[1] "whatAction #ktu1-3_ii_l8_tṣmt #verb.emotion #humiliation"

我只需要@ana个属性中的少数属性值,值为2和3,示例为print

[1] "#ktu1-3_ii_l6b_tḫtṣb #contend"
[1] "#ktu1-3_ii_l7_tmḫṣ #humiliation"
[1] "#ktu1-3_ii_l8_tṣmt #humiliation"

我已经做了几次尝试,其中之一是后续的,但它不起作用:

interpRef_ana <- for (i in 1:length(interpRef)) print(paste(xmlGetAttr(interpRef[[i]],"ana",[2:3])))

==== XML示例====

每个<ref>都在<interp>之内,每个@ana都遵循相同的层次结构,词汇来自预定义的分类法。

<interp xml:id="ktu1-3_ii_l6b_int" ana="#ktu1-3_ii_l6b" corresp="#ktu1-3_ii_6b">
  <desc>
    <ref ana="whatAction #ktu1-3_ii_l6b_tḫtṣb #verb.competition #contend"
                                    >Action belongs to verb competition subcategory contend
     <stage ana="whatResult #result #defeate_ofOpposition" />
</ref>
<castList>
  <castItem>
    <persName type="character" ana="#whatCharacter #Character #ANT #Female">
      <state ana="#whatRole #active" />ʾAnatu
    </persName>
  </castItem>
</castList>
<view>
  <placeName ana="#whatContext #battle">battle
    <location ana="#whatSphere #outside" />
  </placeName>
</view>
<stage ana="#whatBehavior">
  <span ana="#toDestroy #five_dD #rage">Voluntary
                                        intentionality, to destroy of her free will, with rage
                                        (level five).</span>
  <span ana="#AffectEntity_and_other">The result of action has
                                        an impact on ʾAnatu and others</span>
  </stage>
 </desc>
</interp>
<interp xml:id="ktu1-3_ii_l7_int" ana="#ktu1-3_ii_l7" corresp="#ktu1-3_ii_l7">
 <desc>
  <ref ana="whatAction #ktu1-3_ii_l7_tmḫṣ #verb.emotion #humiliation"
                                    >Action belongs to verb emotion, subcategory humuliation
   <stage ana="whatResult #result #defeate_ofOpposition" />
</ref>
 <castList>
  <castItem>
    <persName type="character" ana="#whatCharacter #Character #ANT #Female">
      <state ana="#whatRole #active" />ʾAnatu
    </persName>
    <persName type="character" cert="low" ana="#Character #UNK #Unknown">
      <state ana="#behav #passive" />People from the West
    </persName>
  </castItem>
 </castList>
 <view>
  <placeName ana="#whatContext #battle">battle
    <location ana="#whatSphere #outside" />outside her household
  </placeName>
 </view>
 <stage ana="#whatBehavior">
  <span ana="#toDestroy #free #five_dD">Voluntary
                                        intentionality, to destroy of her free will, with rage
                                        (level five)Five.</span>
  <span ana="#affectEntity_and_other">The result of action has
                                        an impact on ʾAnatu and others</span>
  </stage>
 </desc>
</interp>

====更新====

我尝试使用库string,理论上它可行,我可以选择我需要的属性值:

x  <- for (i in 1:length(interp)) print((cbind((y=(KTU = (xmlGetAttr(interp[[i]],"ana")))), (z=(verb.category = (xmlGetAttr(interpRef[[i]],"ana")))))))
x1 <- print (cbind(word(word(y,-1)),(word(z, -3, -2))))
x1

> x  <- for (i in 1:length(interp)) print((cbind((y=(KTU = (xmlGetAttr(interp[[i]],"ana")))), (z=(verb.category = (xmlGetAttr(interpRef[[i]],"ana")))))))
 [,1]                [,2]                                                           
[1,] "#ktu1-3_ii_l5b-6a" "whatAction #ktu1-3_ii_l5b-6a_tmtḫṣ #verb.competition #contend"
 [,1]             [,2]                                                        
[1,] "#ktu1-3_ii_l6b" "whatAction #ktu1-3_ii_l6b_tḫtṣb #verb.competition #contend"
 [,1]            [,2]                                                      
[1,] "#ktu1-3_ii_l7" "whatAction #ktu1-3_ii_l7_tmḫṣ #verb.emotion #humiliation"
 [,1]            [,2]                                                      
[1,] "#ktu1-3_ii_l8" "whatAction #ktu1-3_ii_l8_tṣmt #verb.emotion #humiliation"
 [,1]                 [,2]                                                       
[1,] "ktu1-3_ii_l11b_12a" "whatAction #ktu1-3_ii_l11b-12a_ʿtkt #put_together #action"
 [,1]                  [,2]                                                       
[1,] "#ktu1-3_ii_l12b_13a" "whatAction #ktu1-3_ii_l12b-13a_šnst #put_together #action"
 [,1]                  [,2]                                                   
[1,] "#ktu1-3_ii_l13b_14a" "whatAction #ktu1-3_ii_l13b-14a_tġlt #action #movement"
 [,1]                  [,2]                                                       
[1,] "#ktu1-3_ii_l15b_16a" "whatAction #ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation #action"
> x
NULL
> x1 <- print (cbind(word(word(y,-1)),(word(z, -3, -2))))
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
> x1
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"

但它只给出了一次出现的属性值而不是列表。所以我尝试添加for (i in 1:length(interp))

 x1 <- for (i in 1:length(interp)) print (cbind(word(word(y,-1)),(word(z, -3, -2))))

> x1 <- for (i in 1:length(interp)) print (cbind(word(word(y,-1)),(word(z, -3, -2))))
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
 [,1]                  [,2]                                    
[1,] "#ktu1-3_ii_l15b_16a" "#ktu1-3_ii_l5b_6a_tmtḫṣ #confrontation"
> x1

我只重复相同的事件8次(=实际发生次数)

提前,谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

我找到了解决方案,也许会有所帮助:

listInterp <- list()
 for (i in 1:length(interp)) {
  print ((cbind((y=(KTU = (xmlGetAttr(interp[[i]],"ana")))), (z=(verb.category = (xmlGetAttr(interpRef[[i]],"ana")))))))
  listInterp[[i]] <- (paste(cbind(word(word(y,-1)),(word(z, -3, -2))), collapse=": ")) #to select attribute values
  }
listInterp<-(lapply(listInterp,gsub,pattern="#",replacement="")) #to replace # by empty space
listInterp

#result
[[1]]
[1] "ktu1-3_ii_l5b-6a: ktu1-3_ii_l5b-6a_tmtḫṣ verb.competition"
[[2]]
[1] "ktu1-3_ii_l6b: ktu1-3_ii_l6b_tḫtṣb verb.competition"
[[3]]
[1] "ktu1-3_ii_l7: ktu1-3_ii_l7_tmḫṣ verb.emotion"
[[4]]
[1] "ktu1-3_ii_l8: ktu1-3_ii_l8_tṣmt verb.emotion"
[...]