如何使用Python urllib.request进行Web Scrapping 2018

时间:2018-05-04 16:52:37

标签: python web-scraping beautifulsoup

我从视频教程中写了一个简单的脚本:

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://pythonprogramming.net/parsememcparseface/').read()

soup = bs.BeautifulSoup(source, 'lxml')

print(source)

当我运行程序时它会返回此错误:

Traceback (most recent call last):
  File "/Users/UntouchedDruid4/Projects/Web_Scrapper/app.py", line 4, in <module>
    source = urllib.request.urlopen('https://pythonprogramming.net/parsememcparseface/').read()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)>

我不知道这意味着什么。请帮忙。

1 个答案:

答案 0 :(得分:0)

使用urllib2或请求和抓取使用re.search或BeautifulSoup As Your Want

import urllib2
from bs4 import BeautifulSoup
import re

read = urllib2.urlopen('https://pythonprogramming.net/parsememcparseface/').read()

使用RE.SEARCH

f = re.search(r'<title>(.*)</title>', read)
title = f.group(1)
print " Title Of the Site Is : " + title 

使用BeautifulSoup

soup = BeautifulSoup(read, 'html.parser')
print soup.title ## Example For Title

这只是标题的一个例子