在dask和pandas中将csv文件读取到数据框时出错

时间:2019-05-27 07:43:45

标签: python pandas dask

我正在尝试使用dask(也是pandas)读取csv文件,但我收到以下错误。我试图更改编码格式,但似乎没有任何效果。但是当我确实在Excel中另存为csv ut8时,代码开始工作。我尝试对大熊猫使用同样的方法,并给了我相同的错误。我尝试显式地将编码指定为utf-16,但出现错误,要求您使用utf-16-le or utf-16-be。当我也使用我得到的错误。 我正在使用的csv文件有问题吗?

import dask.dataframe as dd


with open(Mar_N_W, 'rb') as f: 
    result = chardet.detect(f.read()) 
    Mar_NW = dd.read_csv(Mar_N_W,encoding=result['encoding'],sep=None)


~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py in _next_iter_line(self, row_num)
   2693 
   2694         try:
-> 2695             return next(self.data)
   2696         except csv.Error as e:
   2697             if self.warn_bad_lines or self.error_bad_lines:

~\AppData\Local\Continuum\anaconda3\lib\codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

~\AppData\Local\Continuum\anaconda3\lib\encodings\utf_16.py in _buffer_decode(self, input, errors, final)
     67                 raise UnicodeError("UTF-16 stream does not start with BOM")
     68             return (output, consumed)
---> 69         return self.decoder(input, self.errors, final)
     70 
     71     def reset(self):

UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x0a in position 0: truncated data

0 个答案:

没有答案