尝试使用pandas读取csv时出错

时间:2017-09-15 00:47:44

标签: python-3.x pandas csv

File "C:\Users\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1748, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
  File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
  File "pandas\_libs\parsers.pyx", line 989, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:12175)
  File "pandas\_libs\parsers.pyx", line 1117, in pandas._libs.parsers.TextReader._convert_column_data (pandas\_libs\parsers.c:14136)
  File "pandas\_libs\parsers.pyx", line 1169, in pandas._libs.parsers.TextReader._convert_tokens (pandas\_libs\parsers.c:14972)
  File "pandas\_libs\parsers.pyx", line 1273, in pandas._libs.parsers.TextReader._convert_with_dtype (pandas\_libs\parsers.c:17119)
  File "pandas\_libs\parsers.pyx", line 1289, in pandas._libs.parsers.TextReader._string_convert (pandas\_libs\parsers.c:17347)
  File "pandas\_libs\parsers.pyx", line 1524, in pandas._libs.parsers._string_box_utf8 (pandas\_libs\parsers.c:23041)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 43: invalid continuation byte

以上代码用于读取简单的csv文件。但我不断收到以下错误

SELECT CONCAT(COUNT(DISTINCT bar_id),'/',COUNT(distinct foo_id)) cnt,
       CAST(COUNT(DISTINCT bar_id) AS FLOAT)/CASE WHEN COUNT(distinct foo_id) = 0
                                                  THEN 1
                                                  ELSE CAST(COUNT(distinct foo_id) AS FLOAT)
                                              END "div",
       CASE WHEN DATE_TRUNC('day', b.created_at) IS NULL
            THEN DATE_TRUNC('day', f.created_at)
            ELSE DATE_TRUNC('day', b.created_at)
        END "date"       
  FROM bars b
  FULL OUTER JOIN foos f
    ON date_trunc('day', b.created_at) = date_trunc('day', f.created_at)
 GROUP BY "date"
 ORDER BY "date" DESC

4 个答案:

答案 0 :(得分:1)

您的解析器正在尝试解析utf-8数据,但您的文件似乎处于其他编码状态(或者可能只有一个无效字符)。

尝试指示解析器解析为plain ascii,可能还有一些代码页(我不知道Python,所以无法帮助解决)。

您似乎需要使用encoding参数。

这是list with possible encodings

答案 1 :(得分:0)

对不起,我对此太迟了,请将您的代码更改为以下代码,看看是否可行。

import pandas
df = pandas.read_csv("trial.csv", encoding="ISO-8859-1")

答案 2 :(得分:0)

import pandas
df = pandas.read_csv("trial.csv", "rb")

如果上述建议均无效,则“ rb”读取二进制文件可能会成功

答案 3 :(得分:0)

store=pd.read_csv('Super_Store.csv', encoding='windows-1252') 

我们只需要告诉 Python 这个文件的实际编码。经过一些跟踪和错误,我发现它是在 windows-1252 编码中。

这可能是因为这些文件在某个时候保存在 Windows 计算机上,这是该计算机的默认字符编码。 详情请至:
HTML Windows-1252 (ANSI) Reference