Question

我有一个excel文件，我正在读取一个pandas DataFrame，它在第1行（python索引）上有标题，在标题和数据之间有一个空行。当我指定index_col时，它将空行作为索引的一部分处理为NaN。避免这种行为的最佳方法是什么？

测试文件：

idx value

a   1

不指定index_col：

print xs.parse(header = 1)
   idx  value
0  NaN    NaN
1    a      1

print xs.parse(header = 1).index
Int64Index([0, 1], dtype='int64')

指定索引col：

print xs.parse(header = 1, index_col = 0)
     value
idx       
NaN    NaN
a        1

print xs.parse(header = 1, index_col = 0).index
Index([nan, u'a'], dtype='object')

Answer 1

您可以通过skiprows=[1]跳过空行，我在虚拟xl表上对此进行了测试，请参阅ExcelFile.parse：

In [44]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1])

Out[44]:
   idx  value
0   12    NaN
1    2    NaN
2    1    NaN

与之比较：

In [45]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse()

Out[45]:
   idx  value
0  NaN    NaN
1   12    NaN
2    2    NaN
3    1    NaN

In [47]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0)
Out[47]:
   idx  value
0   12    NaN
1    2    NaN
2    1    NaN
In [49]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(skiprows=[1], header=0, index_col=0)
Out[49]:
     value
idx       
12     NaN
2      NaN
1      NaN
In [50]:

xs = pd.ExcelFile(r'c:\data\book1.xls')
xs.parse(header=0, index_col=0)
Out[50]:
     value
idx       
NaN    NaN
 12    NaN
 2     NaN
 1     NaN

指定index_col时，Pandas ExcelFile.parse在索引中具有NaN

1 个答案: