在熊猫中读取具有正确编码的CSV文件

时间:2019-09-24 10:40:29

标签: python-3.x pandas dataframe

我在jupiternotebook中无法读取csv文件,以下是csv文件的链接github链接

https://github.com/roshanthokchom/new-assignment/blob/master/spam.csv

 import numpy as np
 import pandas as pd
 from sklearn.naive_bayes import GaussianNB
 import urllib
 pd.read_csv('spam.csv',encoding='latin-1')

ParserError: Error tokenizing data. C error: Expected 2 fields in line 13, saw 4

1 个答案:

答案 0 :(得分:-1)

@Roshan这是您解决问题的方法:

import pandas as pd
import csv
with open('spam.csv', newline='') as f:
    csvread = csv.reader(f)
    raw_data = list(csvread)

data = []
for i in batch_data:
    i = i[0].split("\t")
    data.append(i)

final_data = pd.DataFrame(data)

您可以指定编码方式,但是文件中的逗号之间包含逗号,因此,如果您正常阅读大熊猫,它们将基于“,”分隔数据。这就是为什么您遇到错误