Question

我试图读取以UTF-8编码的RSS feed，但是当我尝试打印它们时，条目标题和说明的某些部分会导致异常'UnicodeEncodeError'。我想将提要导入python，重新格式化数据，并将其显示在我管理的另一个站点上。

我最初尝试使用feedparser库，但是它试图将所有内容都转换为ASCII，因此我建立了一个非常基本的解析系统。

import urllib.request

print("Content-Type: text/html; charset=UTF-8\n")
print("<html lang=\"en-US\">")
print("<head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">")

fp = urllib.request.urlopen(desired_blog_addr)
mybytes = fp.read()
fp.close()
mystring = mybytes.decode('utf-8')

items = mystring.split('<item>')

del items[0]

for entry in items:

    title, entry = entry.split('</title>')
    title = title.replace('<title>', '')
    try:
        print( title)
    except UnicodeEncodeError:
        print("<!--UnicodeEncodeError ! in title-->")
        print("<textarea cols=\"30\", rows=\"3\">", title.encode('utf-8', 'replace'), "</textarea>")
        pass

我希望这会打印源RSS源中的所有标题，但是每隔一段时间，我会收到UnicodeEncodeError。我可以使用title.encode（'utf-8'，'replace'）将标题中的文本打印到HTML文本区域中，但是我希望避免在标题中出现这样的文本：“ \ xe2 \ x80 \ x98Bumblebee \ xe2 \ x80 \ x99“

提前谢谢！

一个朋友建议我尝试以下方法，它似乎可以工作：

    except UnicodeEncodeError:
        title = title.encode('ascii', 'xmlcharrefreplace')
        title = title.decode('utf-8')

解码urlrequest字节后出现异常UnicodeEncodeError

0 个答案: