Question

嗨，我试图创建下载HTML并将其剥离为可读格式的内容。但是，在调用函数剥离标记时，我收到错误。

这有什么问题？

import urllib
import re
from bs4 import BeautifulSoup, NavigableString
def strip_tags(html, invalid_tags):
    soup = BeautifulSoup(html)

    for tag in soup.findAll(True):
        if tag.name in invalid_tags:
            s = ""

            for c in tag.contents:
                if not isinstance(c, NavigableString):
                    c = strip_tags(unicode(c), invalid_tags)
                s += unicode(c)

            tag.replaceWith(s)

    return soup

urls = ["http://tweakers.net/pricewatch/335374/msi-geforce-gtx-770-gaming/specificaties/", "http://tweakers.net/pricewatch/328943/sapphire-radeon-hd-7950-3gb-gddr5-with-boosts/specificaties/"]
i =0

regex = '<title>(.+?)</title>'
pattern = re.compile(regex)
invalid_tags = ['tr', 'td']

while i<len(urls):
    htmlfile = urllib.urlopen(urls[i])
    htmltext = htmlfile.read()
    souper = BeautifulSoup(htmltext)
    table = souper.find("table", {'class': "spec-detail"})
    html = table
    titles = re.findall(pattern,htmltext)
    print titles
    print '-----------------------------Specificaties-----------------------------'
    print 'Function printed'
    print html
    print strip_tags(html, invalid_tags)
    i+=1

exit()

我的错误：

Traceback (most recent call last):
  File "pithon1.py", line 38, in <module>
print strip_tags(html, invalid_tags)
  File "pithon1.py", line 5, in strip_tags
    soup = BeautifulSoup(html)
  File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 161, in __init__
    markup = markup.read()
TypeError: 'NoneType' object is not callable

这是在print html代码行运行后发生的。

我显然也在使用beautifulsoup。

我对Python并不熟悉，只是初学者。

Html的结果如下：

<table class="spec-detail" style="width:100%">
<colgroup>
<col width="100">
<col>
</col></col></colgroup>
<tr>
<td class="spec-index-column">Categorie</td>
<td class="spec-column first">

Answer 1

您无法从另一个BeautfiulSoup对象中设置BeautifulSoup对象。您正在尝试soup = BeautifulSoup(html)，而实际上应该是soup = html。您现在将遇到必须正确处理的unicode字符问题。

Python TypeError NonType不可调用

1 个答案: