使用与其他标记匹配的标准选择某些XML标记

时间:2017-08-25 21:46:58

标签: python xml lxml

我有一个XML文件,结构如下:

<text>
  <dialogue>
     <pattern>
        We're having a {nice|great} time.
     </pattern>
     <criterion>
       <!-- match this tag, get the above pattern -->
        average_person, tourist, delighted
     </criterion>
  </dialogue>
     <pattern>
        The service {here stinks|is terrible}!
     </pattern>
     <criterion>
        tourist, disgruntled, average_person
     </criterion>
  <dialogue>
     <pattern>
        They have {smoothies|funny hats}. Neat!
     </pattern>
     <criterion>
        tourist, smoothie_enthusiast
     </criterion>
  </dialogue>
  <dialogue>
     <pattern>
        I wonder how {expensive|valuable} these resort tickets are?
     </pattern>
     <criterion>
        merchant, average_person
     </criterion>
  </dialogue>
</text>

我想要做的是浏览dialogue代码,查看criterion代码,并匹配单词列表。如果它们匹配,那么我想在dialogue标记中使用该模式。我正在使用Python来完成这项任务。

我目前正在做的是使用lxml&#34; etree&#34;看起来像这样:

tree = etree.parse('tourists.xml')
root = tree.getroot()
g=0
for i in root.iterfind('dialogue/criterion'):
   a = i.text.split(',')
   # The "personality" variable has a value like "delighted" or "disgruntled".
   # "tags_to_match" are the criterion that we want to, well, match. It may
   # have criterion like "merchant", "tourist", or "delighted".
   # When the tags match (in the "match_tags" function) returns true, it
   # appends the pattern to the "tourist_patterns" list.
   if personality is not 'average_person' and match_tags( tags_to_match, a):
       tourist_patterns.append(root[g][0].text)
   g+=1
# When we don't have a match, we just go with the "average_person" tag.
if len(tourist_patterns) == 0:
   # Go through the tags again, choosing the ones that match the
   # 'average_person' personality and put it in the "tourist_patterns" list.

然后我会浏览&#34; tourist_patterns&#34;列出并摘出我想要的东西。

我试图简化这一点。如何浏览代码,在criterion代码中匹配我想要的文字,然后在pattern代码中采用该模式?我也一直试图在标准不匹配时设置默认值(因此&#34; average_person&#34;人格标准)。

编辑:有些评论员要求提供匹配的列表。基本上,我希望它匹配criterion标记中的部分或全部字词,并且会在pattern标记下方的dialogue标记中提供文字。所以,如果我正在寻找&#34;旅游&#34;和&#34; smoothie_enthusiast&#34;,它会在我的XML示例中得到一个匹配。然后,我想获得pattern标记文字&#34;他们有{smoothies | funny hats}。整洁!&#34 ;.如果它无法与criterion标记中的任何字词匹配,我只会尝试匹配&#34; average_person&#34;和#34;旅游&#34;。

反过来,tourist_patterns匹配时看起来像这样:

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!']

当它不匹配时,它会匹配:

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!', 'The service {here stinks|is terrible}!']

希望能够解决问题。

0 个答案:

没有答案