如何在xml文件中搜索单词并在python中打印

时间:2018-10-20 18:46:32

标签: python xml nlp nltk tokenize

我想在.xml文件中搜索特定单词(由用户输入)。这是我的xml文件。

<?xml version="1.0" encoding="UTF-8"?>
<words>
<entry>
<word>John</word>
<pron>()</pron>
<gram>[Noun]</gram>
<poem></poem>
<meanings>
<meaning>name</meaning>
</meanings>
</entry>
</words>

这是我的代码

import nltk
from nltk.tokenize import word_tokenize
import os
import xml.etree.ElementTree as etree


sen = input("Enter Your sentence - ")

print(sen)
print("\n")
print(word_tokenize(sen)[0])

tree = etree.parse('roman.xml')
node=etree.fromstring(tree)

#node=etree.fromstring('<a><word>waya</word><gram>[Noun]</gram> 
<meaning>talking</meaning></a>')
s = node.findtext(word_tokenize(sen)[0])
print(s)

我尽了一切,但仍然给我错误

  

需要一个类似字节的对象,而不是'ElementTree'

我真的不知道该怎么解决。

1 个答案:

答案 0 :(得分:1)

发生错误是因为您正在将elementtree对象传递给fromstring()方法。这样做:

>>> import os
>>> import xml.etree.ElementTree as etree
>>> a = etree.parse('a.xml')
>>> a
<xml.etree.ElementTree.ElementTree object at 0x10fcabeb8>
>>> b = a.getroot()
>>> b
<Element 'words' at 0x10fb21f48>
>>> b[0][0].text
'John'

使用find()和findall()方法进行搜索。

有关更多信息,请检查lib:https://docs.python.org/3/library/xml.etree.elementtree.html

简单的例子:

test.xml

<?xml version="1.0" encoding="UTF-8"?>
<words>
  <word value="John"></word>
  <word value="Mike"></word>
  <word value="Scott"></word>
</words>

example.py

root = ET.parse("test.xml")
>>> search = root.findall(".//word/.[@value='John']")
>>> search
[<Element 'word' at 0x10be9c868>]
>>> search[0].attrib
{'value': 'John'}
>>> search[0].tag
'word'