Question

>>> infile = urllib.request.urlopen("http://www.yahoo.com")

解码：

>>>infile.read(100).decode()

'<!DOCTYPE html>\n<html lang="en-US" class="dev-desktop uni-purple-border  bkt901 https  uni-dark-purp'

不解码：

>>>infile.read(100)

b'le" style="">\n<!-- m2 template  -->\n<head>\n    <meta http-equiv="Content-Type" content="text/html; c'

看来差异是输出前的'b，我假设这意味着字节。除此之外，输出完全相同。

Answer 1

不，输出不一样;一个是Unicode值，另一个是未解码的字节值。

对于ASCII，看起来相同，但是当您加载使用ASCII字符集之外的字符的任何网页时，差异会更加清晰。

采用UTF-8编码数据，例如：

>>> '–'
'–'
>>> '–'.encode('utf8')
b'\xe2\x80\x93'

这是一个简单的U+2013 EN DASH字符。字节表示显示UTF-8用于编码代码点的3个字节。

你真的想在这里阅读Unicode与编码数据，我建议：