我想从Merriam Webster Dictionary中删除定义。防爆。 http://www.merriam-webster.com/dictionary/abandon
这是我要抓的代码片段。
<div class="definition-block def-text">
<ul class="definition-list no-count">
<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave and never return to (someone who needs protection or help)</span>
</p>
</li>
<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave and never return to (something)</span>
</p>
</li>
<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave (a place) because of danger</span>
</p>
</li>
</ul>
</div>
这是我的代码
for element in soup.find(class_="definition-list no-count"):
if(soup.find("li")):
print element
输出
<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave and never return to (someone who needs protection or help)</span>
</p>
</li>
<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave and never return to (something)</span>
</p>
</li>
<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave (a place) because of danger</span>
</p>
</li>
但是我想要<span>
中的定义。如果我使用get_text()方法,我会收到类型错误。
for element in soup.find(class_="definition-list no-count"):
if(soup.find("li")):
print soup.get_text(element)
输出:
Traceback (most recent call last):
File "scrape.py", line 18, in <module>
print soup.get_text(element)
File "/usr/lib/python2.7/dist-packages/bs4/element.py", line 852, in get_text
strip, types=types)])
TypeError: 'NoneType' object is not callable
答案 0 :(得分:0)
您是否考虑过使用beautifulsoup来完成这项任务?我相信你可以通过其他方式做到这一点,但是对于beautifulsoup来说,它是微不足道的:
from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('http://www.merriam-webster.com/dictionary/abandon').read()
soup = BeautifulSoup(r)
definitions = soup.find_all("p", class_="definition-inner-statement")
然后您可以根据需要使用定义。