如何摆脱"警告:root:某些字符无法解码,并被替换为REPLACEMENT CHARACTER。"在beautifulSoup python3中

时间:2016-07-19 18:10:43

标签: python-2.7 python-3.x beautifulsoup

使用pydev,在编译此代码时:

import re
import urllib.request 
from bs4 import BeautifulSoup

url = "http://talkingpointsmemo.com/news/trump-response-melania-speech-plagiarism"
req = urllib.request.Request(url, headers={'User-Agent': 'Chrome/51.0.2704.103'})
html = urllib.request.urlopen(req).read()

给出错误WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.

这不起作用。

reqSoup = BeautifulSoup('http://acl.ldc.upenn.edu/P/P96/P96-1004.pdf')

来源:Warning: Some characters could not be decoded, and were replaced by REPLACEMENT CHARACTER 什么是html_path在这一行

soup = BeautifulSoup(open(html_path, 'r'),"html.parser",from_encoding="iso-8859-1")

来源:WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER. With Requests and Beastuifulsoup

0 个答案:

没有答案