尝试通过pandas读取csv文件,但看起来它没有正确读取
代码:
pd.read_csv(data_file_path, sep=",", index_col=0, header=0, dtype = object)
例如:我的数据是(在csv文件中):
12 1.43E+19 This is first line 101010
23 1.43E+19 This is the second line 202020
34 1.43E+19 This is the third line 303030
我试图用第一列作为索引阅读。
输出:
1.43E+19 This is first line 101010
12
23 1.43E+19 This is the second line 202020
34 1.43E+19 This is the third line 303030
输出而不将第一列作为索引:
12 1.43E+19 This is first line 101010
0 23 1.43E+19 This is the second line 202020
1 34 1.43E+19 This is the third line 303030
因此,对该数据的任何进一步处理都忽略了第一行数据。
答案 0 :(得分:1)
我认为你混淆header=0
,这意味着“使用第0行作为标题”,使用header=None
,这意味着“不要从文件中读取标题”。
比较
>>> pd.read_csv("h.csv", header=0, index_col=0)
1.43E+19 This is first line 101010
12
23 1.430000e+19 This is the second line 202020
34 1.430000e+19 This is the third line 303030
>>> pd.read_csv("h.csv", header=None, index_col=0)
1 2 3
0
12 1.430000e+19 This is first line 101010
23 1.430000e+19 This is the second line 202020
34 1.430000e+19 This is the third line 303030
您还可以使用names
指定列名:
>>> pd.read_csv("h.csv", names=["Number", "Line", "Code"], index_col=0)
Number Line Code
12 1.430000e+19 This is first line 101010
23 1.430000e+19 This is the second line 202020
34 1.430000e+19 This is the third line 303030
PS:由于您使用的是sep=","
,但您显示的文件没有任何逗号,我假设您在提问时出于某种原因删除了它们。如果这是对的,请不要:没有人害怕逗号,这只是意味着其他人如果想要测试你的代码就必须猜测将它们放回去的位置。