指定index_col时,Pandas ExcelFile.parse在索引中具有NaN

时间:2015-02-26 13:49:30

标签: pandas

我有一个excel文件,我正在读取一个pandas DataFrame,它在第1行(python索引)上有标题,在标题和数据之间有一个空行。当我指定index_col时,它将空行作为索引的一部分处理为NaN。避免这种行为的最佳方法是什么?

测试文件:

idx value

a   1

不指定index_col:

print xs.parse(header = 1)
   idx  value
0  NaN    NaN
1    a      1

print xs.parse(header = 1).index
Int64Index([0, 1], dtype='int64')

指定索引col:

print xs.parse(header = 1, index_col = 0)
     value
idx       
NaN    NaN
a        1

print xs.parse(header = 1, index_col = 0).index
Index([nan, u'a'], dtype='object')

1 个答案:

答案 0 :(得分:1)

您可以通过skiprows=[1]跳过空行,我在虚拟xl表上对此进行了测试,请参阅ExcelFile.parse

In [44]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1])

Out[44]:
   idx  value
0   12    NaN
1    2    NaN
2    1    NaN

与之比较:

In [45]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse()

Out[45]:
   idx  value
0  NaN    NaN
1   12    NaN
2    2    NaN
3    1    NaN

In [47]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0)
Out[47]:
   idx  value
0   12    NaN
1    2    NaN
2    1    NaN
In [49]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0, index_col=0)
Out[49]:
     value
idx       
12     NaN
2      NaN
1      NaN
In [50]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(header=0, index_col=0)
Out[50]:
     value
idx       
NaN    NaN
 12    NaN
 2     NaN
 1     NaN