Question

我正在使用python中的请求模块从url请求html数据。
这是我的代码

import requests

source = requests.get('http://coreyms.com')

print(source.text)

当我在原子中运行它会给我一个错误；

File "/Users/isaacrichardson/Desktop/Python/Web Scraping/wiki.py", line 7, in <module>
    print(source.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 34807: ordinal not in range(128)

但是当我在Treehouse工作区中运行它时，它可以正常工作并向我显示html数据。
原子或我的代码有什么问题？

Answer 1

请求库没有为atom正确安装或无法使用。正确安装它可以解决问题。

如果这不起作用，我将尝试使用漂亮的汤模块：

from bs4 import BeautifulSoup
doc = BeautifulSoup(source.text, "html.parser")
print(doc.text)

Answer 2

requests guesses the encoding，当您访问响应对象的.text属性时。如果您事先知道响应的编码，则应该在访问.text属性之前显式设置响应：

import requests

source = requests.get('http://coreyms.com')
source.encoding = 'utf-8'  # or whatever the encoding is
print(source.text)

或者，您也可以使用.content来访问二进制响应内容并自行对其进行解码。

您可能希望通过仅打印source.encoding来验证IDE中是否确实对编码进行了不同的猜测。

从网站请求数据时，Atom出现错误

2 个答案: