Python:无法使用read_csv正确读取第一行csv文件

时间:2014-12-23 01:39:06

标签: python csv

尝试通过pandas读取csv文件,但看起来它没有正确读取

代码:

pd.read_csv(data_file_path, sep=",", index_col=0, header=0, dtype = object)

例如:我的数据是(在csv文件中):

12 1.43E+19 This is first line  101010  
23 1.43E+19 This is the second line 202020  
34 1.43E+19 This is the third line  303030  

我试图用第一列作为索引阅读。

输出:

     1.43E+19 This is first line    101010  
12  
23 1.43E+19 This is the second line 202020  
34 1.43E+19 This is the third line 303030  

输出而不将第一列作为索引:

  12 1.43E+19 This is first line 101010  
0 23 1.43E+19 This is the second line 202020  
1 34 1.43E+19 This is the third line 303030  

因此,对该数据的任何进一步处理都忽略了第一行数据。

1 个答案:

答案 0 :(得分:1)

我认为你混淆header=0,这意味着“使用第0行作为标题”,使用header=None,这意味着“不要从文件中读取标题”。

比较

>>> pd.read_csv("h.csv", header=0, index_col=0)
        1.43E+19       This is first line  101010  
12                                                 
23  1.430000e+19  This is the second line    202020
34  1.430000e+19   This is the third line    303030
>>> pd.read_csv("h.csv", header=None, index_col=0)
               1                        2       3
0                                                
12  1.430000e+19       This is first line  101010
23  1.430000e+19  This is the second line  202020
34  1.430000e+19   This is the third line  303030

您还可以使用names指定列名:

>>> pd.read_csv("h.csv", names=["Number", "Line", "Code"], index_col=0)
          Number                     Line    Code
12  1.430000e+19       This is first line  101010
23  1.430000e+19  This is the second line  202020
34  1.430000e+19   This is the third line  303030

PS:由于您使用的是sep=",",但您显示的文件没有任何逗号,我假设您在提问时出于某种原因删除了它们。如果这是对的,请不要:没有人害怕逗号,这只是意味着其他人如果想要测试你的代码就必须猜测将它们放回去的位置。