Question

我知道这个问题之前已经被问过无数次了，但是我似乎无法使任何解决方案有效。我尝试使用codecs模块，即io模块。似乎没有任何作用。

我正在从网络上抓取一些东西，然后将每个项目的详细信息记录到文本文件中，但是脚本在第一次遇到Unicode字符时便会中断。

AHIMSA Centro de Sanación Pránica, Pranic Healing

此外，我不确定Unicode字符何时何地弹出，这增加了额外的复杂性，因此我需要一个总体解决方案，而且我不确定如何处理潜在的非ASCII字符

我不确定在生产环境中是否要使用Python 3.6.5，因此该解决方案必须与2.7一起使用。

我在这里可以做什么？我该如何处理？

# -*- coding: utf-8 -*-
...
with open('test.txt', 'w') as f:
f.write(str(len(discoverable_cards)) + '\n\n')
    for cnt in range(0, len(discoverable_cards)):
        t = get_time()
        f.write('[ {} ] {}\n'.format(t, discoverable_cards[cnt]))
        f.write('[ {} ] {}\n'.format(t, cnt + 1))
        f.write('[ {} ] {}\n'.format(t, product_type[cnt].text))
        f.write('[ {} ] {}\n'.format(t, titles[cnt].text))
...

任何帮助将不胜感激！

Answer 1

鉴于您使用的是python2.7，在将它们传递给write之前，您可能希望使用与Unicode兼容的字符集（例如“ utf8”）对所有字符串进行显式编码，可以通过简单的编码来实现方法：

def safe_encode(str_or_unicode):
    # future py3 compatibility: define unicode, if needed:
    try:
       unicode
    except NameError:
       unicode = str
    if isinstance(str_or_unicode, unicode):
        return str_or_unicode.encode("utf8")
    return str_or_unicode

然后您将像这样使用它：

f.write('[ {} ] {}\n'.format(safe_encode(t), safe_encode(discoverable_cards[cnt])))

处理Unicode字符

1 个答案: