<myroot> <data txt="some0" txt1 = "some1" txt2 = "some2" >
<data2>
< bank = "SBI" bank2 = "SBI2" >
<data2>
<data3>
<branch = "bang1" branch = "bang2" >
<data3>
</data>
<data txt="some0" txt1 = "some1" txt2 = "some2" >
<data2>
< bank = "citi" bank2 = "citi2" >
<data2>
<data3>
<branch = "bang3" branch = "bang4" >
<data3>
</data> </myroot>
以上数据存储在不在xml文件中的变量中。我无法解析它,因为它不是一个xml文件。请帮助我将数据转换为xml格式/文件,并在我正在尝试的脚本下解析相同的内容:
stdout = "<myroot>%s</myroot>" % stdout
print'main data', stdout
tree = ElementTree.fromstring(stdout)
tree1 = ET.parse('tree')
在脚本的第一行,我在数据中添加了一个根标签,在主数据中,我将显示上面显示的xml数据,然后我试图解析它,但是它会抛出错误。
答案 0 :(得分:0)
因为你的XML错误而引发错误。
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 25
请看第3行,第25栏.tada
>>> stdout.split('\n')[2][25:]
' bank = "SBI" bank2 = "SBI2" >'
答案 1 :(得分:0)
它使用BeautifulSoup解析正常:
>>> s = """<myroot> <data txt="some0" txt1 = "some1" txt2 = "some2" >
... <data2>
... < bank = "SBI" bank2 = "SBI2" >
... <data2>
... <data3>
... <branch = "bang1" branch = "bang2" >
... <data3>
... </data>
...
... <data txt="some0" txt1 = "some1" txt2 = "some2" >
... <data2>
... < bank = "citi" bank2 = "citi2" >
... <data2>
... <data3>
... <branch = "bang3" branch = "bang4" >
... <data3>
... </data> </myroot>"""
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> print soup.prettify()
<myroot>
<data txt="some0" txt1="some1" txt2="some2">
<data2>
< bank = "SBI" bank2 = "SBI2" >
<data2>
<data3>
<branch "bang1" = branch="bang2">
<data3>
</data3>
</branch>
</data3>
</data2>
</data2>
</data>
<data txt="some0" txt1="some1" txt2="some2">
<data2>
< bank = "citi" bank2 = "citi2" >
<data2>
<data3>
<branch "bang3" = branch="bang4">
<data3>
</data3>
</branch>
</data3>
</data2>
</data2>
</data>
</myroot>