如果不是NaN,则大熊猫合并标题行

时间:2020-03-27 02:38:12

标签: python pandas dataframe pandas-groupby

我正在将数据表的标题分为两行的Excel工作表导入到数据框中:

Colour | NaN   | Shape | Mass | NaN
NaN    | width | NaN   | NaN  | Torque

green  | 33    | round | 2    | 6
etc

我想将前两行折叠成一个标题:

Colour | width | Shape | Mass | Torque

green  | 33    | round | 2    | 6
...

我尝试了merged_header = df.loc[0].combine_first(df.loc[1]) 但我不确定如何将其恢复到原始数据框中。

我尝试过:

# drop top 2 rows
df = df.drop(df.index[[0,1]])
# then add the merged one in:
res = pd.concat([merged_header, df], axis=0)

但这只是将merged_header插入为一列。我尝试了this tutorial中的merge的其他组合,但是没有运气。

merged_header.append(df)给出了类似的错误结果,res = df.append(merged_header)几乎正确,但是标题位于末尾:

green  | 33    | round | 2    | 6
...
Colour | width | Shape | Mass | Torque

要提供更多细节,这是我到目前为止所掌握的:

df = pd.read_excel(ltro19, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')

以防影响下一步。

3 个答案:

答案 0 :(得分:1)

让我们使用列表理解来平化多索引列标题:

df.columns = [f'{j}' if str(i)=='nan' else f'{i}' for i, j in df.columns]

输出:

['Colour', 'width', 'Shape', 'Mass', 'Torque']

答案 1 :(得分:0)

这应该对您有用:

df.columns = list(df.columns.get_level_values(0))

答案 2 :(得分:0)

可能是由于我对条款不了解,上述建议并没有直接导致我找到可行的解决方案。似乎我正在使用数据框

>>> print(type(df))
>>> <class 'pandas.core.frame.DataFrame'>

但是,我认为没有标题。

此解决方案有效,尽管它涉及跳出数据框并进入列表,然后将其放回列标题。受Merging Two Rows (one with a value, the other NaN) in Pandas

的启发
df = pd.read_excel(name_of_file, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')

# merge the two headers which are weirdly split over two rows
merged_header = df.loc[0].combine_first(df.loc[1])
# turn that into a list
header_list = merged_header.values.tolist()
# load that list as the new headers for the dataframe
df.columns = header_list
# drop top 2 rows (old split header)
df = df.drop(df.index[[0,1]])