我有一个包含4行格式完全相同的csv文件。在用熊猫阅读csv时,它并没有阅读所有的内容。我无法弄清楚为什么?因为格式是相同的.Plz帮助。列出如下:
tmp_csv_outfile:
6801 2017/09/28 18:56:51.390624 129.1972 107 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [tp]0022 > f4 37 3e 00 00
6802 2017/09/28 18:56:51.390640 129.1972 108 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [tp] TEST: ~Finished Testcase: TEST0471
6803 2017/09/28 18:56:51.390646 129.1973 109 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [dia] trigger received - resetting session timeout 5000
6804 2017/09/28 18:56:51.390652 129.1975 110 XXX1 YYYY ZZZZ 908 log info verbose 1 908 :: 235 :: [dia][th1] Diagnosis Core responded, sending to the th1 Adapter (allConnected = 0)
df = pd.read_csv(tmp_csv_outfile,names=["Data"],header=None,sep='\s\s+$',engine='python')
print df.tail(3)
输出
Data
0 6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1 6802 2017/09/28 18:56:51.390640 129.1972 108 X...
解决方案 SOVLED
经过长时间的挖掘,我找到了解决方案 https://github.com/pandas-dev/pandas/issues/16893
更新大熊猫后,它开始正常工作。感谢@ jezrael提供的宝贵意见。
答案 0 :(得分:1)
我认为问题在于分隔符,因此将其更改为某些不在数据中的值:
df = pd.read_csv(tmp_csv_outfile, names=["Data"], sep='¥', engine='python')
print (df)
Data
0 6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1 6802 2017/09/28 18:56:51.390640 129.1972 108 X...
2 6803 2017/09/28 18:56:51.390646 129.1973 109 X...
3 6804 2017/09/28 18:56:51.390652 129.1975 110 X...
编辑:
让我的实际数据很好用:
df = pd.read_csv('faulty.csv', sep='|', names=['Data'])
print (df)
Data
0 6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1 6802 2017/09/28 18:56:51.390640 129.1972 108 X...
2 6803 2017/09/28 18:56:51.390646 129.1973 109 X...
3 6804 2017/09/28 18:56:51.390652 129.1975 110 X...