Question

在Python3.2上我尝试从远程站点获取HTML时遇到以下错误，它在Python 2.7上运行良好

enter image description here

代码：

def connectAmazon():
    usleep = lambda x: sleep(x/1000000.0)
    factor = 400
    shouldRetry = True
    retries = 0
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36'}
    attempt = 0
    while shouldRetry == True:
        random = randint(2, 9)
        attempt += 1
        print ("Attempt#", attempt)
        #print (attempt)
        url = "http://www.amazon.com/gp/offer-listing/B009OZUPUC/sr=/qid=/ref=olp_prime_new?ie=UTF8&colid=&coliid=&condition=new&me=&qid=&seller=&shipPromoFilter=1&sort=sip&sr"
        html = requests.get(url)
        status = html.status_code
        if status == 200:
            shouldRetry = False
            print ("Success. Check HTML Below")
            print(html.text) #The Buggy Line
            break
        elif status == 503:
            retries += 1
            delay = random * (pow(retries, 4)*100)
            print ("Delay(ms) = ", delay)
            #print (delay)
            usleep(delay)
            shouldRetry = True


connectAmazon()

如何在Python 3.2或Py 3.x上解决此问题？

Answer 1

好的，Windows命令行在编码^*方面存在很大问题。编码错误是因为在输出时，print将html.text编码为cmd编码（您可以通过发出命令chcp来了解它是哪一个）。 html.text中可能只有一个字符不能以cmd的编码进行编码。

我对Python3的解决方案是强制输出编码。可悲的是，在Python3中，这比我想要的更有问题。您需要将行print(html.text)替换为：

import sys
sys.stdout.buffer.write(html.text.encode('utf8'))

当然，该行在Python2中不起作用。在Python2中，您可以在打印之前encode输出print(html.text)，因此print html.text.encode('utf8')可以替换为：

print

重要说明：在Python2中print('hi')是一个关键字，而不是一个函数。所以调用print是有效的，因为print('hi',2)正在括号内打印表达式。当您执行('hi',2)时，您将获得输出的元组utf8。这不完全是你想要的。它的工作原理是奇迹：D

希望这有帮助！

_{*这是由于缺乏对650001的支持。他们有一个奇怪的utf-8代码页，它与{{1}}并不完全相同，Python也无法使用它。}

获取HTML时编码错误

1 个答案: