使用pydev,在编译此代码时:
import re
import urllib.request
from bs4 import BeautifulSoup
url = "http://talkingpointsmemo.com/news/trump-response-melania-speech-plagiarism"
req = urllib.request.Request(url, headers={'User-Agent': 'Chrome/51.0.2704.103'})
html = urllib.request.urlopen(req).read()
给出错误WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
这不起作用。
reqSoup = BeautifulSoup('http://acl.ldc.upenn.edu/P/P96/P96-1004.pdf')
来源:Warning: Some characters could not be decoded, and were replaced by REPLACEMENT CHARACTER
什么是html_path
在这一行
soup = BeautifulSoup(open(html_path, 'r'),"html.parser",from_encoding="iso-8859-1")