我正试图通过其源视图抓取一页 例如: 视图源:https://www.youtube.com/watch?v=t3-zAlsCJ4c&t=1607s
我无法通过此代码获取它:
res = requests.get('view-source:https://www.youtube.com/watch?v=t3-zAlsCJ4c&t=1607s')
它会导致如下错误:
追踪(最近一次呼叫最后一次):
File "C:\Users\hdtra\Desktop\In processing\Facebook_spider.py", line 31, in <module> res = requests.get('view-source:https://www.facebook.com/pg/vuonraunhatrang/about/?ref=page_internal') File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 72, in get return request('get', url, params=params, **kwargs) File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 58, in request return session.request(method=method, url=url, **kwargs) File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 508, in request resp = self.send(prep, **send_kwargs) File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 612, in send adapter = self.get_adapter(url=request.url) File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 703, in get_adapter raise InvalidSchema("No connection adapters were found for '%s'" % url) requests.exceptions.InvalidSchema: No connection adapters were found for 'view-source:https://www.facebook.com/pg/vuonraunhatrang/about/?ref=page_internal'
我如何抓取这个Viewsource链接?
使用.get()
方法进行刮擦不适用于我的项目。
我没有来自该普通页面的足够信息,但是使用这个视图源窗口,它可以很好地工作。
答案 0 :(得分:0)
您可以使用BeautifulSoup。
from bs4 import BeautifulSoup
import urllib
r =urllib.urlopen(<url_to_scrape>).read()
soup = BeautifulSoup(r)
print(soup.prettify())