Question

我正在阅读文本文件。我用python2做得很好，但我决定用python3运行我的代码。

我阅读文本文件的代码是：

neg_words = []
with open('negative-words.txt', 'r') as f:
    for word in f:
        neg_words.append(word)

当我在python 3上运行此代码时，出现以下错误：

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-14-1e2ff142b4c1> in <module>()
      3 pos_words = []
      4 with open('negative-words.txt', 'r') as f:
----> 5     for word in f:
      6         neg_words.append(word)
      7 with open('positive-words.txt', 'r') as f:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py in 
decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 3988: invalid continuation byte

在我看来，有一种形式的文本python2解码没有任何问题，python3不能。

有人可以解释一下这个错误与python2和python3之间的区别。为什么它出现在一个版本而不是另一个版本？我怎么能阻止它？

Answer 1

您的文件不是UTF-8编码的。在打开文件时弄清楚使用了什么编码并明确说明：

with open('negative-words.txt', 'r', encoding="<correct codec>") as f:

在Python 2中，str是二进制字符串，包含编码数据，而不是Unicode文本。如果您使用import io然后使用io.open()，则会遇到相同的问题，或者您尝试使用word.decode('utf8')解码所读取的数据。

您可能想要阅读Unicode和Python。我强烈推荐Ned Batchelder的Pragmatic Unicode。

Answer 2

或者我们可以简单地以二进制模式读取文件：

$a = App\stock::select(DB::raw('nama_brg'),
  DB::raw('sum(jumlah) as total'),
   DB::raw("DATE_FORMAT(tanggal,'%d') as bulan"))
  ->groupBy('bulan')->get();

'r'打开以供阅读（默认）

'b'二进制模式

为什么Python3会读取一个UnicodeDecodeError来读取Python2没有的文本文件？

2 个答案: