Question

我有一个XML文件，结构如下：

<text>
  <dialogue>
     <pattern>
        We're having a {nice|great} time.
     </pattern>
     <criterion>
       <!-- match this tag, get the above pattern -->
        average_person, tourist, delighted
     </criterion>
  </dialogue>
     <pattern>
        The service {here stinks|is terrible}!
     </pattern>
     <criterion>
        tourist, disgruntled, average_person
     </criterion>
  <dialogue>
     <pattern>
        They have {smoothies|funny hats}. Neat!
     </pattern>
     <criterion>
        tourist, smoothie_enthusiast
     </criterion>
  </dialogue>
  <dialogue>
     <pattern>
        I wonder how {expensive|valuable} these resort tickets are?
     </pattern>
     <criterion>
        merchant, average_person
     </criterion>
  </dialogue>
</text>

我想要做的是浏览dialogue代码，查看criterion代码，并匹配单词列表。如果它们匹配，那么我想在dialogue标记中使用该模式。我正在使用Python来完成这项任务。

我目前正在做的是使用lxml＆＃34; etree＆＃34;看起来像这样：

tree = etree.parse('tourists.xml')
root = tree.getroot()
g=0
for i in root.iterfind('dialogue/criterion'):
   a = i.text.split(',')
   # The "personality" variable has a value like "delighted" or "disgruntled".
   # "tags_to_match" are the criterion that we want to, well, match. It may
   # have criterion like "merchant", "tourist", or "delighted".
   # When the tags match (in the "match_tags" function) returns true, it
   # appends the pattern to the "tourist_patterns" list.
   if personality is not 'average_person' and match_tags( tags_to_match, a):
       tourist_patterns.append(root[g][0].text)
   g+=1
# When we don't have a match, we just go with the "average_person" tag.
if len(tourist_patterns) == 0:
   # Go through the tags again, choosing the ones that match the
   # 'average_person' personality and put it in the "tourist_patterns" list.

然后我会浏览＆＃34; tourist_patterns＆＃34;列出并摘出我想要的东西。

我试图简化这一点。如何浏览代码，在criterion代码中匹配我想要的文字，然后在pattern代码中采用该模式？我也一直试图在标准不匹配时设置默认值（因此＆＃34; average_person＆＃34;人格标准）。

编辑：有些评论员要求提供匹配的列表。基本上，我希望它匹配criterion标记中的部分或全部字词，并且会在pattern标记下方的dialogue标记中提供文字。所以，如果我正在寻找＆＃34;旅游＆＃34;和＆＃34; smoothie_enthusiast＆＃34;，它会在我的XML示例中得到一个匹配。然后，我想获得pattern标记文字＆＃34;他们有{smoothies | funny hats}。整洁！＆＃34 ;.如果它无法与criterion标记中的任何字词匹配，我只会尝试匹配＆＃34; average_person＆＃34;和＃34;旅游＆＃34;。

反过来，tourist_patterns匹配时看起来像这样：

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!']

当它不匹配时，它会匹配：

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!', 'The service {here stinks|is terrible}!']

希望能够解决问题。

使用与其他标记匹配的标准选择某些XML标记

0 个答案: