从内心阶级刮痧

时间:2016-02-26 07:32:00

标签: python-2.7 lxml

我想从Merriam Webster Dictionary中删除定义。防爆。 http://www.merriam-webster.com/dictionary/abandon

这是我要抓的代码片段。

<div class="definition-block def-text">
        <ul class="definition-list no-count">
                      <li>
              <p class="definition-inner-item">
                <span><span class="intro-colon">:</span> to leave and never return to (someone who needs protection or help)</span>
              </p>
            </li>
                      <li>
              <p class="definition-inner-item">
                <span><span class="intro-colon">:</span> to leave and never return to (something)</span>
              </p>
            </li>
                      <li>
              <p class="definition-inner-item">
                <span><span class="intro-colon">:</span> to leave (a place) because of danger</span>
              </p>
            </li>
                  </ul>
      </div>

这是我的代码

for element in soup.find(class_="definition-list no-count"):
    if(soup.find("li")):
        print element

输出

<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave and never return to (someone who needs protection or help)</span>
</p>
</li>


<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave and never return to (something)</span>
</p>
</li>


<li>
<p class="definition-inner-item">
<span><span class="intro-colon">:</span> to leave (a place) because of danger</span>
</p>
</li>

但是我想要<span>中的定义。如果我使用get_text()方法,我会收到类型错误。

for element in soup.find(class_="definition-list no-count"):
        if(soup.find("li")):
            print soup.get_text(element)

输出:

Traceback (most recent call last):
  File "scrape.py", line 18, in <module>
    print soup.get_text(element)
  File "/usr/lib/python2.7/dist-packages/bs4/element.py", line 852, in get_text
    strip, types=types)])
TypeError: 'NoneType' object is not callable

1 个答案:

答案 0 :(得分:0)

您是否考虑过使用beautifulsoup来完成这项任务?我相信你可以通过其他方式做到这一点,但是对于beautifulsoup来说,它是微不足道的:

from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('http://www.merriam-webster.com/dictionary/abandon').read()
soup = BeautifulSoup(r)
definitions = soup.find_all("p", class_="definition-inner-statement")

然后您可以根据需要使用定义。

相关问题