Question

我正在将数据表的标题分为两行的Excel工作表导入到数据框中：

Colour | NaN   | Shape | Mass | NaN
NaN    | width | NaN   | NaN  | Torque

green  | 33    | round | 2    | 6
etc

我想将前两行折叠成一个标题：

Colour | width | Shape | Mass | Torque

green  | 33    | round | 2    | 6
...

我尝试了merged_header = df.loc[0].combine_first(df.loc[1]) 但我不确定如何将其恢复到原始数据框中。

我尝试过：

# drop top 2 rows
df = df.drop(df.index[[0,1]])
# then add the merged one in:
res = pd.concat([merged_header, df], axis=0)

但这只是将merged_header插入为一列。我尝试了this tutorial中的merge的其他组合，但是没有运气。

merged_header.append(df)给出了类似的错误结果，res = df.append(merged_header)几乎正确，但是标题位于末尾：

green  | 33    | round | 2    | 6
...
Colour | width | Shape | Mass | Torque

要提供更多细节，这是我到目前为止所掌握的：

df = pd.read_excel(ltro19, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')

以防影响下一步。

Answer 1

让我们使用列表理解来平化多索引列标题：

df.columns = [f'{j}' if str(i)=='nan' else f'{i}' for i, j in df.columns]

输出：

['Colour', 'width', 'Shape', 'Mass', 'Torque']

Answer 2

这应该对您有用：

df.columns = list(df.columns.get_level_values(0))

Answer 3

可能是由于我对条款不了解，上述建议并没有直接导致我找到可行的解决方案。似乎我正在使用数据框

>>> print(type(df))
>>> <class 'pandas.core.frame.DataFrame'>

但是，我认为没有标题。

此解决方案有效，尽管它涉及跳出数据框并进入列表，然后将其放回列标题。受Merging Two Rows (one with a value, the other NaN) in Pandas

的启发

df = pd.read_excel(name_of_file, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')

# merge the two headers which are weirdly split over two rows
merged_header = df.loc[0].combine_first(df.loc[1])
# turn that into a list
header_list = merged_header.values.tolist()
# load that list as the new headers for the dataframe
df.columns = header_list
# drop top 2 rows (old split header)
df = df.drop(df.index[[0,1]])

如果不是NaN，则大熊猫合并标题行

3 个答案: