使用gzip和pickle时的UnicodeDecodeError

时间:2017-12-04 07:26:10

标签: python python-3.x gzip pickle

我在 Python 3.4

中使用 mnist 数据的练习代码进行深度学习

原始代码是

import _pickle as cPickle
def load_data():
    f = gzip.open('../data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = cPickle.load(f)
    f.close()
    return (training_data, validation_data, test_data)
def load_data_wrapper():
    tr_d, va_d, te_d = load_data()
    ....

然而,它会导致UnicodeDecodeError,根据互联网上的建议,我将其cPickle.load(f)更改为pickle.load(f, encoding='latin1')

当我在shell中运行

时会发生同样的错误
>>> training_data, validation_data, test_data = \
... mnist_loader.load_data_wrapper() \
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\E\Deep Learning Tutorial\neural-networks-and-deep-learning-master\src\mnist_loader.py", line 68, in load_data_wrapper
tr_d, va_d, te_d = load_data()
  File "C:\E\Deep Learning Tutorial\neural-networks-and-deep-learning-master\src\mnist_loader.py", line 43, in load_data

错误行追溯到:

f = gzip.open('../data/mnist.pkl.gz', 'rb')

具有与之前相同的错误,但仅发生在不同的行

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

如何解决这个问题?

1 个答案:

答案 0 :(得分:2)

首先,我能够使用从我下载的https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip存档中提取的mnist.pkl.gz数据文件重现该问题。 pickle.load(f)调用引发了以下异常:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

但是,当我在encoding='bytes'来电中添加pickle.load()参数时,错误消失了,正如我在您提问的评论中所建议的那样。

另一项更改是将import _pickle as cPickle替换为import pickle,但我认为这不重要(请参阅What difference between pickle and _pickle in python 3?)。

然而,可能重要的其他差异是我在Windows上使用Python 3.6.3的事实。

import gzip
import pickle

def load_data():
    f = gzip.open('mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = \
        pickle.load(f, encoding='bytes')  # Note encoding argument value.
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    tr_d, va_d, te_d = load_data()
    print('gzipped pickled data loaded successfully')

load_data_wrapper()

一个题外话: load_data()函数可以像这样写得更简洁:

def load_data():
    with gzip.open('mnist.pkl.gz', 'rb') as f:
        training_data, validation_data, test_data = \
            pickle.load(f, encoding='bytes')
    return training_data, validation_data, test_data