Question

我正在尝试对文件进行简单的解析，并因特殊字符而导致错误：

#!/usr/bin/env python                                                                                                                 
# -*- coding: utf-8 -*-                                                                                                               

infile = 'finance.txt'
input = open(infile)
for line in input:
  if line.startswith(u'▼'):

我收到错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1718: ordinal not in range(128)

解？

Answer 1

您需要提供编码。例如，如果是utf-8：

import io

with io.open(infile, encoding='utf-8') as fobj:
    for line in fobj:
        if line.startswith(u'▼'):

这适用于Python 2和3.默认情况下，Python 2打开文件，假设没有编码，即读取内容将返回字节字符串。因此，您只能读取ascii个字符。在Python 3中，默认是什么在许多情况下，locale.getpreferredencoding(False)会返回utf-8。 Python 2中的标准open()不允许指定编码。使用io.open()使其成为未来的证明，因为您在切换到Python 3时不需要更改代码。

在Python 3中：

>>> io.open is open
True

Answer 2

使用正确的编码打开文件，例如，如果您的文件是使用Python 3编码的UTF8：

with open('finance.txt', encoding='utf8') as f:
    for line in input:
        if line.startswith(u'▼'):
            # whatever

使用Python 2，您可以使用io.open()（也适用于Python 3）：

import io

with io.open('finance.txt', encoding='utf8') as f:
    for line in input:
        if line.startswith(u'▼'):
            # whatever

Python，UnicodeDecodeError：＆＃39; ascii＆＃39;编解码器不能解码位置1718的字节0xc2：序数不在范围内（128）

2 个答案: