我有一个excel文件,我正在读取一个pandas DataFrame,它在第1行(python索引)上有标题,在标题和数据之间有一个空行。当我指定index_col时,它将空行作为索引的一部分处理为NaN。避免这种行为的最佳方法是什么?
测试文件:
idx value
a 1
不指定index_col:
print xs.parse(header = 1)
idx value
0 NaN NaN
1 a 1
print xs.parse(header = 1).index
Int64Index([0, 1], dtype='int64')
指定索引col:
print xs.parse(header = 1, index_col = 0)
value
idx
NaN NaN
a 1
print xs.parse(header = 1, index_col = 0).index
Index([nan, u'a'], dtype='object')
答案 0 :(得分:1)
您可以通过skiprows=[1]
跳过空行,我在虚拟xl表上对此进行了测试,请参阅ExcelFile.parse
:
In [44]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1])
Out[44]:
idx value
0 12 NaN
1 2 NaN
2 1 NaN
与之比较:
In [45]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse()
Out[45]:
idx value
0 NaN NaN
1 12 NaN
2 2 NaN
3 1 NaN
In [47]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0)
Out[47]:
idx value
0 12 NaN
1 2 NaN
2 1 NaN
In [49]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0, index_col=0)
Out[49]:
value
idx
12 NaN
2 NaN
1 NaN
In [50]:
xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(header=0, index_col=0)
Out[50]:
value
idx
NaN NaN
12 NaN
2 NaN
1 NaN