Excel中合并的单元格在熊猫中变为NaN

时间:2019-09-02 10:24:00

标签: python excel pandas

如何将具有这种格式的Excel文件读入pandas DataFrame中?

a       b   c    d       e    f
Type    1   22   Car     Yes  2019
                 Train   Yes  
Type    2   25   Car     No   2018
Notype  1        Car     Yes  2019
                 Train   

第一行有三列是合并的单元格(两行),但是其余的是单独的行

问题是如果我使用

data = pd.read_excel("excel.xls").fillna(method='ffill')

然后,第三行的值"25"和第四行的"Yes"将填充下面的NaN值,这不是我想要的。因此,合并的每一列都应复制两行的精确值。在这种情况下,"a", "b", "c""f"是合并的列

所以正确地它应该像这样加载:

a       b   c    d       e   f
Type    1   22   Car     Yes 2019
Type    1   22   Train   Yes 2019
Type    2   25   Car     No  2018
Notype  1   NaN  Car     Yes 2019
Notype  1   NaN  Train   NaN 2019

1 个答案:

答案 0 :(得分:2)

如果需要向前填充列表中不包括某些名称的所有列,请使用Index.difference并向前填充缺失值:

cols_excluded = ['c','e']
cols = df.columns.difference(cols_excluded)

df[cols] = df[cols].ffill()
print (df)
        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0   NaN  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

如果有必要,还向前填充所有缺失值,并排除每列的最后缺失值(此处为cols_excluded):

df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(),
                                            df[cols_excluded].ffill())
print (df)

        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0  22.0  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN