我在xml文件中有数千个条目,每个条目都有一个命名空间名称。我要解析的一个简洁示例如下。
<d:entry d:title="Buddism" class="entry">
<span class="ps"> noun </span>
<span class="pinyin"> fojiao </span>
</d:entry>
<d:entry d:title="hew" class="entry">
<span class="ps"> verb </span>
<span class="pinyin"> jue </span>
</d:entry>
<d:entry d:title="roost" class="entry">
<span class="ps"> noun </span>
<span class="pinyin"> qixidi </span>
</d:entry>
我尝试使用BeautifulSoup4通过以下步骤解析它,但没有任何反应。
➜ ~ python3
Python 3.5.2 (default, Jul 28 2016, 21:28:00)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> xmlstr = """
... <d:entry d:title="Buddism" class="entry"><span class="ps"> noun </span><span class="pinyin"> fojiao </span></d:entry><d:entry d:title="hew" class="entry"><span class="ps"> verb </span><span class="pinyin"> jue </span></d:entry><d:entry d:title="roost" class="entry"><span class="ps"> noun </span><span class="pinyin"> qixidi </span></d:entry>"""
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(xmlstr, "xml")
>>> t = soup.find(r'd:title="hew"')
>>> t
>>> t.ps
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'ps'
>>> type(t)
<class 'NoneType'>
如何在BeautifulSoup或类似工具中解析它?我不想用正则表达式手动解析它。
答案 0 :(得分:1)
soup = bs4.BeautifulSoup(xmlstr, 'lxml')
soup.find(attrs={'d:title':'hew'}).find(class_='ps')
出:
<span class="ps"> verb </span>
soup.attrs