Question

我想抓取网站http://berlin.startups-list.com/startups/mobile。我需要一个网站上有Hrefs的列表。我使用Python 3.5和Beautiful Soup。

我已使用此代码

抓取了网站https://www.kickstarter.com

Loading Libraries
import urllib
import urllib.request
from bs4 import BeautifulSoup



#define URL for scraping
theurl1 = "http://berlin.startups-list.com/startups/mobile"
thepage1 = urllib.request.urlopen(theurl1)

#Cooking the Soup
soup1 = BeautifulSoup(thepage1,"html.parser")

#-------------------------------------------------------------------------------------------------------------------
#Scraping

#Scraping "Link" (href)
href_Kunst = [i.a['href'] for i in soup1.find_all('div', attrs={'class' : 'project-thumbnail'})]
print(href_Kunst)

此代码有效！

但我无法访问http://berlin.startups-list.com/startups/mobile。没有代码的抓取部分....我甚至无法用urllib和Beautiful Soup打开网站。

代码的fisrt部分向我展示了以下引用：

Traceback (most recent call last):
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1151, in _send_request
    self.endheaders(body)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 934, in _send_output
    self.send(msg)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 877, in send
    self.connect()
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 849, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 711, in create_connection
    raise err
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 702, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\A80881\workspace\Startup List\Berlin_Mobile\__init__.py", line 16, in <module>
    thepage1 = urllib.request.urlopen(theurl1)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 466, in open
    response = self._open(req, data)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 484, in _open
    '_open', req)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1282, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1256, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat>

我是否以错误的方式加载网站？有人有什么想法吗？谢谢你的帮助

刮痧网站与python 3.5和美丽的汤。无法访问网站

0 个答案: