我有一个XML文件,结构如下:
<text>
<dialogue>
<pattern>
We're having a {nice|great} time.
</pattern>
<criterion>
<!-- match this tag, get the above pattern -->
average_person, tourist, delighted
</criterion>
</dialogue>
<pattern>
The service {here stinks|is terrible}!
</pattern>
<criterion>
tourist, disgruntled, average_person
</criterion>
<dialogue>
<pattern>
They have {smoothies|funny hats}. Neat!
</pattern>
<criterion>
tourist, smoothie_enthusiast
</criterion>
</dialogue>
<dialogue>
<pattern>
I wonder how {expensive|valuable} these resort tickets are?
</pattern>
<criterion>
merchant, average_person
</criterion>
</dialogue>
</text>
我想要做的是浏览dialogue
代码,查看criterion
代码,并匹配单词列表。如果它们匹配,那么我想在dialogue
标记中使用该模式。我正在使用Python来完成这项任务。
我目前正在做的是使用lxml
&#34; etree&#34;看起来像这样:
tree = etree.parse('tourists.xml')
root = tree.getroot()
g=0
for i in root.iterfind('dialogue/criterion'):
a = i.text.split(',')
# The "personality" variable has a value like "delighted" or "disgruntled".
# "tags_to_match" are the criterion that we want to, well, match. It may
# have criterion like "merchant", "tourist", or "delighted".
# When the tags match (in the "match_tags" function) returns true, it
# appends the pattern to the "tourist_patterns" list.
if personality is not 'average_person' and match_tags( tags_to_match, a):
tourist_patterns.append(root[g][0].text)
g+=1
# When we don't have a match, we just go with the "average_person" tag.
if len(tourist_patterns) == 0:
# Go through the tags again, choosing the ones that match the
# 'average_person' personality and put it in the "tourist_patterns" list.
然后我会浏览&#34; tourist_patterns&#34;列出并摘出我想要的东西。
我试图简化这一点。如何浏览代码,在criterion
代码中匹配我想要的文字,然后在pattern
代码中采用该模式?我也一直试图在标准不匹配时设置默认值(因此&#34; average_person&#34;人格标准)。
编辑:有些评论员要求提供匹配的列表。基本上,我希望它匹配criterion
标记中的部分或全部字词,并且会在pattern
标记下方的dialogue
标记中提供文字。所以,如果我正在寻找&#34;旅游&#34;和&#34; smoothie_enthusiast&#34;,它会在我的XML示例中得到一个匹配。然后,我想获得pattern
标记文字&#34;他们有{smoothies | funny hats}。整洁!&#34 ;.如果它无法与criterion
标记中的任何字词匹配,我只会尝试匹配&#34; average_person&#34;和#34;旅游&#34;。
反过来,tourist_patterns
匹配时看起来像这样:
>>> tourist_pattern
['They have {smoothies|funny hats}. Neat!']
当它不匹配时,它会匹配:
>>> tourist_pattern
['They have {smoothies|funny hats}. Neat!', 'The service {here stinks|is terrible}!']
希望能够解决问题。