如何将具有这种格式的Excel文件读入pandas DataFrame中?
a b c d e f
Type 1 22 Car Yes 2019
Train Yes
Type 2 25 Car No 2018
Notype 1 Car Yes 2019
Train
第一行有三列是合并的单元格(两行),但是其余的是单独的行
问题是如果我使用
data = pd.read_excel("excel.xls").fillna(method='ffill')
然后,第三行的值"25"
和第四行的"Yes"
将填充下面的NaN值,这不是我想要的。因此,合并的每一列都应复制两行的精确值。在这种情况下,"a", "b", "c"
和"f"
是合并的列
所以正确地它应该像这样加载:
a b c d e f
Type 1 22 Car Yes 2019
Type 1 22 Train Yes 2019
Type 2 25 Car No 2018
Notype 1 NaN Car Yes 2019
Notype 1 NaN Train NaN 2019
答案 0 :(得分:2)
如果需要向前填充列表中不包括某些名称的所有列,请使用Index.difference
并向前填充缺失值:
cols_excluded = ['c','e']
cols = df.columns.difference(cols_excluded)
df[cols] = df[cols].ffill()
print (df)
a b c d e
0 Type 1.0 22.0 Car Yes
1 Type 1.0 NaN Train Yes
2 Type 2.0 25.0 Car No
3 Notype 1.0 NaN Car Yes
4 Notype 1.0 NaN Train NaN
如果有必要,还向前填充所有缺失值,并排除每列的最后缺失值(此处为cols_excluded
):
df[cols_excluded] = df[cols_excluded].where(df[cols_excluded].bfill().isna(),
df[cols_excluded].ffill())
print (df)
a b c d e
0 Type 1.0 22.0 Car Yes
1 Type 1.0 22.0 Train Yes
2 Type 2.0 25.0 Car No
3 Notype 1.0 NaN Car Yes
4 Notype 1.0 NaN Train NaN