我想抓取网站http://berlin.startups-list.com/startups/mobile。我需要一个网站上有Hrefs的列表。我使用Python 3.5和Beautiful Soup。
我已使用此代码
抓取了网站https://www.kickstarter.com
Loading Libraries
import urllib
import urllib.request
from bs4 import BeautifulSoup
#define URL for scraping
theurl1 = "http://berlin.startups-list.com/startups/mobile"
thepage1 = urllib.request.urlopen(theurl1)
#Cooking the Soup
soup1 = BeautifulSoup(thepage1,"html.parser")
#-------------------------------------------------------------------------------------------------------------------
#Scraping
#Scraping "Link" (href)
href_Kunst = [i.a['href'] for i in soup1.find_all('div', attrs={'class' : 'project-thumbnail'})]
print(href_Kunst)
此代码有效!
但我无法访问http://berlin.startups-list.com/startups/mobile。 没有代码的抓取部分....我甚至无法用urllib和Beautiful Soup打开网站。
代码的fisrt部分向我展示了以下引用:
Traceback (most recent call last):
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 877, in send
self.connect()
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 711, in create_connection
raise err
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 702, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\A80881\workspace\Startup List\Berlin_Mobile\__init__.py", line 16, in <module>
thepage1 = urllib.request.urlopen(theurl1)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 466, in open
response = self._open(req, data)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 484, in _open
'_open', req)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1282, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\A80881\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1256, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat>
我是否以错误的方式加载网站?有人有什么想法吗? 谢谢你的帮助