如何解析存储在变量中的xml数据?

时间:2013-07-08 05:25:18

标签: python xml-parsing

  <myroot>  <data txt="some0" txt1 = "some1" txt2 = "some2" >
                 <data2>
                        < bank = "SBI" bank2 = "SBI2" >
                 <data2>
                 <data3>
                        <branch = "bang1" branch = "bang2" >
                 <data3>
            </data>

            <data txt="some0" txt1 = "some1" txt2 = "some2" >
                 <data2>
                        < bank = "citi" bank2 = "citi2" >
                 <data2>
                 <data3>
                        <branch = "bang3" branch = "bang4" >
                 <data3>
            </data> </myroot>

以上数据存储在不在xml文件中的变量中。我无法解析它,因为它不是一个xml文件。请帮助我将数据转换为xml格式/文件,并在我正在尝试的脚本下解析相同的内容:

stdout = "<myroot>%s</myroot>" % stdout
print'main data', stdout
tree = ElementTree.fromstring(stdout)
tree1 = ET.parse('tree')

在脚本的第一行,我在数据中添加了一个根标签,在主数据中,我将显示上面显示的xml数据,然后我试图解析它,但是它会抛出错误。

2 个答案:

答案 0 :(得分:0)

因为你的XML错误而引发错误。

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 3, column 25

请看第3行,第25栏.tada

>>> stdout.split('\n')[2][25:]
' bank = "SBI" bank2 = "SBI2" >'

答案 1 :(得分:0)

它使用BeautifulSoup解析正常:

>>> s = """<myroot>  <data txt="some0" txt1 = "some1" txt2 = "some2" >
...                  <data2>
...                         < bank = "SBI" bank2 = "SBI2" >
...                  <data2>
...                  <data3>
...                         <branch = "bang1" branch = "bang2" >
...                  <data3>
...             </data>
... 
...             <data txt="some0" txt1 = "some1" txt2 = "some2" >
...                  <data2>
...                         < bank = "citi" bank2 = "citi2" >
...                  <data2>
...                  <data3>
...                         <branch = "bang3" branch = "bang4" >
...                  <data3>
...             </data> </myroot>"""

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(s)
>>> print soup.prettify()
<myroot>
 <data txt="some0" txt1="some1" txt2="some2">
  <data2>
   &lt; bank = "SBI" bank2 = "SBI2" &gt;
   <data2>
    <data3>
     <branch "bang1" = branch="bang2">
      <data3>
      </data3>
     </branch>
    </data3>
   </data2>
  </data2>
 </data>
 <data txt="some0" txt1="some1" txt2="some2">
  <data2>
   &lt; bank = "citi" bank2 = "citi2" &gt;
   <data2>
    <data3>
     <branch "bang3" = branch="bang4">
      <data3>
      </data3>
     </branch>
    </data3>
   </data2>
  </data2>
 </data>
</myroot>