下面给出了我的XML文件示例,我想访问文本“面包也是最重要的”和“食物”类别。
<sentences>
<sentence id="32897564#894393#2">
<text>The bread is top notch as well.</text>
<aspectTerms>
<aspectTerm term="bread" polarity="positive" from="4" to="9"/>
</aspectTerms>
<aspectCategories>
<aspectCategory category="food" polarity="positive" />
</aspectCategories>
</sentence>
我的代码是
test_text_file=open('Restaurants_Test_Gold.txt', 'rt')
test_text_file1=test_text_file.read()
root = ET.fromstring(test_text_file1)
for page in list(root):
text = page.find('text').text
Category = page.find('aspectCategory')
print ('sentence: %s; category: %s' % (text,Category))
test_text_file.close()
答案 0 :(得分:0)
这取决于XML格式的复杂程度。最简单的方法是直接访问路径。
import xml.etree.ElementTree as ET
tree = ET.parse('x.xml')
root = tree.getroot()
print(root.find('.//text').text)
print(root.find('.//aspectCategory').attrib['category'])
但是,如果有类似的标记,则可能要使用更长的路径,例如.//aspectCategories/aspectCategory
。
答案 1 :(得分:0)
这是我的代码可以解决您的问题
import os
import xml.etree.ElementTree as ET
basedir = os.path.abspath(os.path.dirname(__file__))
filenamepath = os.path.join(basedir, 'Restaurants_Test_Gold.txt')
test_text_file = open(filenamepath, 'r')
file_contents = test_text_file.read()
tree = ET.fromstring(file_contents)
for sentence in list(tree):
sentence_items = list(sentence.iter())
# remove first element because it's the sentence element [<sentence>] itself
sentence_items = sentence_items[1:]
for item in sentence_items:
if item.tag == 'text':
print(item.text)
elif item.tag == 'aspectCategories':
category = item.find('aspectCategory')
print(category.attrib.get('category'))
test_text_file.close()
希望有帮助