Question

我正在尝试使用scrapy将一些源代码中的页面标题打印到命令提示符/行。如果任何字符是非标准的（即具有重音符号，变音符号等），则会导致错误。我尝试过使用以下内容：

myheader = titles.extract()[0]
myheader = str(myheader)
print '********** Page Title:', myheader.decode('utf-8'), '**********'

和...

myheader = titles.extract()[0]
        myheader = str(myheader)
        print '********** Page Title:', myheader.decode(), '**********'

和...

myheader = titles.extract()[0]
        myheader = str(myheader)
        print '********** Page Title:', myheader.encode('utf-8'), '**********'

我试图打印导致此错误的示例是：

<meta name="title" content="Mustang Cup Liga Postobón Clausura tables, rankings & standings | WhoScored.com">

在文本'Mustang CupLigaPostobónClausura表中排名第23位产生错误，排名＆amp;排名| WhoScored.com“'因为这是字母'o'之上的重音。

实际错误是：

exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)

有人可以建议为什么我的上述方法都不起作用吗？

由于

Answer 1

所以正确的做法是：

myheader = titles.extract()[0]
print u'********** Page Title: {} **********'.format(myheader).encode('utf-8')