Question

我正在使用此数据集：

https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/commutingtoworkbygenderukcountryandregion

因此加载：

commuting_data_xls = pd.ExcelFile(commuting_data_filename)
commuting_data_sheets = commuting_data_front['Table description '].dropna()
commuting_data_1 = pd.read_excel(commuting_data_xls, '1', header=4, usecols=range(1,13))
commuting_data_1.dropna().dropna(axis=1)

所得到的层次结构索引仅在指定所有索引列的位置正确获得行。

我该如何更正并命名索引列？

Answer 1

尝试以下步骤：

使用pd.read_excel（）打开，仅打开所需的工作表和范围。

commuting_data_xls = pd.read_excel（“ commutingdata.xlsx”，'1'，标头= 4，usecols = range（1,13））
重置多索引名称。

commuting_data_xls.index.names = ['性别'，'工作区域'，'区域']

重置索引，然后限制行以消除总数，我假设您希望它们消失吗？如果不是，请删除iloc步骤。

commuting_data_xls = commuting_data_xls.reset_index().iloc[0:28]

删除“ Work_Region”列，因为这似乎是多余的。

commuting_data_xls = commuting_data_xls.loc[:,commuting_data_xls.columns != 'Work_Region']

填写“性别”列以替换NaN

commuting_data_xls['Gender'].fillna(method='ffill', inpldace=True)

根据需要重置索引。

commuting_data_xls.set_index('Gender', 'Region')

Excel转换为具有NaN的多级索引的熊猫

1 个答案: