使用pandas.read_csv的DataFrame格式错误

时间:2017-03-17 11:38:25

标签: python csv pandas

我正在尝试打开此数据集:https://www.kaggle.com/dalpozz/creditcardfraud

使用Ipython笔记本。我试过了:

data = pd.read_csv("...Desktop/creditcard.csv")

得到了:

  

CParserError:标记数据时出错。 C错误:内存不足。

然后我尝试了Noobie指出的解决方案: Error tokenizing data. C error: out of memory pandas python, large file csv

现在它可以加载数据。但是,现在我的数据看起来像一个矩阵:

entry 0,0: blank;
entry 0,1: All the headers are here;
entry 1,0: 0
entry 1,1: A whole line of unseparated data here
entry 2,0: 1
entry 2,1: A whole line of unseparated data here
...

如何正确格式化数据?

我的实施:

mylist = []

for chunk in  pd.read_csv('.../Desktop/creditcard.csv', sep=',', chunksize=2000):
    mylist.append(chunk)

data = pd.concat(mylist, axis= 0)
del mylist

几行数据:
第1行:时间," V1"," V2"," V3"," V4"," V5", " V6"" V7"" V8"" V9"" V10"" V11& #34;" V12"" V13"" V14"" V15"" V16",& #34; V17"" V18"" V19"" V20"" V21"" V22&# 34;," V23"" V24"" V25"" V26"" V27"&# 34; V28""量""级"
第二行:
0,-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705, -0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62," 0"

0 个答案:

没有答案