Question

我有两个数据帧，df1和df2。一个具有多索引，例如['A', 'B']，另一个具有单个索引['B']。我想通过索引df2将df1中的数据合并到'B'中，同时保留多索引['A', 'B']。我该怎么办？

请参见下面的示例

data = {
    'state': ['California', 'New York', 'Texas'],
    'capital': ['Sacramento', 'Albany', 'Austin'],
}
df_state = pd.DataFrame.from_dict(data).set_index('state')

data = {
    'state': ['California', 'California', 'New York', 'New York', 'Texas', 'Texas'],
    'year': [2000, 2010, 2000, 2010, 2000, 2010],
    'population': [33871648, 37253956, 18976457, 19378102, 20851820, 25145561],
}
df_state_year = pd.DataFrame.from_dict(data).set_index(['state', 'year'])

df_state_year.merge(df_state['capital'], on=['state'], how='left')

结果是具有单个索引“状态”的数据框。我想保留原始的多索引['state'，'year']。

使用斯科特·波士顿的答案，我最终得到了

df_state_year.reset_index()\
             .merge(df_state['capital'], on=['state'], how='left')\
             .set_index(['state', 'year'])

这可能是版本差异，但是合并似乎完全删除了我的索引。因此，仅重置年份会使状态索引消失。我删除了append的原因是，我不希望多余的自动编号字段成为索引的一部分。

Answer 1

根据您的查询，请记住一些要点，例如可读性，进一步的实现等，然后我会这样做：

import pandas as pd
import numpy as np

outer_index=['California','California','New York','New York', 'Texas','Texas']

inner_index=[2000, 2010, 2000, 2010, 2000, 2010]

capital=['Sacramento', 'Sacramento','Albany','Albany', 'Austin','Austin']

population_data = {
    'population': [33871648, 37253956, 18976457, 19378102, 20851820, 25145561],
}

index_hierarchy=list(zip(outer,inner,capital))
index_hierarchy=pd.MultiIndex.from_tuples(index_hierarchy)


records = pd.DataFrame(population_data,index=index_hierarchy)

records.index.names=['State','Year','Captial']

records

输出

注意：如果您要手动创建（州，首都，人口）数据，则可以采用上述方法。

合并数据帧以保持多索引

1 个答案: