Question

在我在第3个代码块上输入2个if语句之前，我得到了几乎相同的错误，它无法连接str和Nonetype。

但是，当我在第3个if语句中取消注释print语句时，它会打印出一个带路径的URL列表。

我也在其他网站上试过这个，不仅仅是这个不起作用。

这是我的追溯

Traceback (most recent call last):
  File "linkcrawler.py", line 24, in <module>
    newurl = "http://" + b1 + b2
TypeError: cannot concatenate 'str' and 'NoneType' objects
Traceback (most recent call last):
  File "linkcrawler.py", line 24, in <module>
     newurl = "http://" + b1 + b2
TypeError: cannot concatenate 'str' and 'NoneType' objects

每次运行它我都会得到两个。

import urllib
from bs4 import BeautifulSoup
import traceback
import urlparse
import mechanize

url = "http://www.dailymail.co.uk/home/index.html"
br = mechanize.Browser()
urls = [url]
visited = [url]

while len(urls)>0:
    try:
        br.open(urls[0])
        urls.pop(0)
        for link in br.links():
            newurl = urlparse.urljoin(link.base_url,link.url)
            b1 = urlparse.urlparse(newurl).hostname
            b2 = urlparse.urlparse(newurl).path

            newurl = "http://"+b1+b2

            if newurl not in visited and urlparse.urlparse(url).hostname in newurl:
                urls.append(newurl)
                visited.append(newurl)
                #print newurl
    except:
        traceback.print_exc()
        urls.pop(0)
print visited

Answer 1

b1或b2为None。要解决此问题，请检查b1和b2是否为空或None并重新构建代码：

b1 = urlparse.urlparse(newurl).hostname
b2 = urlparse.urlparse(newurl).path

if b1 and b2:
    newurl = "http://"+b1+b2
    if newurl not in visited and urlparse.urlparse(url).hostname in newurl:
        urls.append(newurl)
        visited.append(newurl)
        #print newurl
else:
    urls.pop(0)

为什么python认为我的变量是空的？

1 个答案: