我正在将数据表的标题分为两行的Excel工作表导入到数据框中:
Colour | NaN | Shape | Mass | NaN
NaN | width | NaN | NaN | Torque
green | 33 | round | 2 | 6
etc
我想将前两行折叠成一个标题:
Colour | width | Shape | Mass | Torque
green | 33 | round | 2 | 6
...
我尝试了merged_header = df.loc[0].combine_first(df.loc[1])
但我不确定如何将其恢复到原始数据框中。
我尝试过:
# drop top 2 rows
df = df.drop(df.index[[0,1]])
# then add the merged one in:
res = pd.concat([merged_header, df], axis=0)
但这只是将merged_header
插入为一列。我尝试了this tutorial中的merge
的其他组合,但是没有运气。
merged_header.append(df)
给出了类似的错误结果,res = df.append(merged_header)
几乎正确,但是标题位于末尾:
green | 33 | round | 2 | 6
...
Colour | width | Shape | Mass | Torque
要提供更多细节,这是我到目前为止所掌握的:
df = pd.read_excel(ltro19, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')
以防影响下一步。
答案 0 :(得分:1)
让我们使用列表理解来平化多索引列标题:
df.columns = [f'{j}' if str(i)=='nan' else f'{i}' for i, j in df.columns]
输出:
['Colour', 'width', 'Shape', 'Mass', 'Torque']
答案 1 :(得分:0)
这应该对您有用:
df.columns = list(df.columns.get_level_values(0))
答案 2 :(得分:0)
可能是由于我对条款不了解,上述建议并没有直接导致我找到可行的解决方案。似乎我正在使用数据框
>>> print(type(df))
>>> <class 'pandas.core.frame.DataFrame'>
但是,我认为没有标题。
此解决方案有效,尽管它涉及跳出数据框并进入列表,然后将其放回列标题。受Merging Two Rows (one with a value, the other NaN) in Pandas
的启发df = pd.read_excel(name_of_file, header=None, skiprows=9)
# delete all empty columns & rows
df = df.dropna(axis = 1, how = 'all')
df = df.dropna(axis = 0, how = 'all')
# merge the two headers which are weirdly split over two rows
merged_header = df.loc[0].combine_first(df.loc[1])
# turn that into a list
header_list = merged_header.values.tolist()
# load that list as the new headers for the dataframe
df.columns = header_list
# drop top 2 rows (old split header)
df = df.drop(df.index[[0,1]])