Question

from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import re

random.seed(datetime.datetime.now())

def getLinks(articleUrl):
    html = urlopen("http://en.wikipedia.org"+articleUrl)
    bsObj = BeautifulSoup(html)
    return bsObj.find("div", {"id":"bodyContent"}).findAll("a",href = re.compile("^(/wiki/)((?!:).)*$"))

getLinks('http://en.wikipedia.org')

操作系统是Linux。上面的脚本吐出了一个＆＃34; urllib.error.URLError：＆＃34;。看了我在谷歌上找到的解决这个问题的一些尝试，但没有一个解决了我的问题（尝试的解决方案包括更改env变量并将nameserver 8.8.8.8添加到我的resolv.conf文件中）。

Answer 1

您应该使用有效的网址<{1}}致电

getLinks()

此外，在您的函数中，您还应调用>>> getLinks('/wiki/Main_Page')以获取响应内容，然后再将其传递给.read()：

BeautifulSoup

urllib.error.URLError：<urlopen error =“”[errno =“” - 2] =“”name =“”或=“”service =“”not =“”known =“”>

1 个答案: