串联时出现Unicode DecodeError

时间:2018-11-28 12:05:16

标签: pandas parsing

我正在尝试训练我的模型,并且我有早先生成的csv文件和一个gz文件。我收到此错误,如下所述,不确定是什么错误。

Traceback (most recent call last):
  File "Model.py", line 87, in <module>
    data = pd.concat([pd.read_csv(log)])
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 767, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

我的代码:

for foo in range(0,1):
    # Read dataframe
    #data = pd.concat([pd.read_csv(log.replace('0',str(idx),1)) for idx in range(5)])
    log = path + 'train_features/log_.csv'

    test_log = path + 'test_features/log_features.gz'
    data = pd.concat([pd.read_csv(log)])

1 个答案:

答案 0 :(得分:0)

尝试:

data = pd.read_csv(log, encoding = "utf-8")

尽管我不明白为什么您需要for循环或pd.concat

如果您不知道编码类型,请尝试:此:

import chardet

with open(log, 'rb') as f:
    result = chardet.detect(f.read())  # or readline if the file is large


data = pd.read_csv(log, encoding=result['encoding'])

source