Question

如何将具有这种格式的Excel文件读入pandas DataFrame中？

a       b   c    d       e    f
Type    1   22   Car     Yes  2019
                 Train   Yes  
Type    2   25   Car     No   2018
Notype  1        Car     Yes  2019
                 Train

第一行有三列是合并的单元格（两行），但是其余的是单独的行

问题是如果我使用

data = pd.read_excel("excel.xls").fillna(method='ffill')

然后，第三行的值"25"和第四行的"Yes"将填充下面的NaN值，这不是我想要的。因此，合并的每一列都应复制两行的精确值。在这种情况下，"a", "b", "c"和"f"是合并的列

所以正确地它应该像这样加载：

a       b   c    d       e   f
Type    1   22   Car     Yes 2019
Type    1   22   Train   Yes 2019
Type    2   25   Car     No  2018
Notype  1   NaN  Car     Yes 2019
Notype  1   NaN  Train   NaN 2019

Answer 1

如果需要向前填充列表中不包括某些名称的所有列，请使用Index.difference并向前填充缺失值：

cols_excluded = ['c','e']
cols = df.columns.difference(cols_excluded)

df[cols] = df[cols].ffill()
print (df)
        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0   NaN  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

如果有必要，还向前填充所有缺失值，并排除每列的最后缺失值（此处为cols_excluded）：

df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(),
                                            df[cols_excluded].ffill())
print (df)

        a    b     c      d    e
0    Type  1.0  22.0    Car  Yes
1    Type  1.0  22.0  Train  Yes
2    Type  2.0  25.0    Car   No
3  Notype  1.0   NaN    Car  Yes
4  Notype  1.0   NaN  Train  NaN

Excel中合并的单元格在熊猫中变为NaN

1 个答案: