我有一个包含['ID', 'Date']
的2级MultiIndex的df。 df按ID
排序,然后按Date
排序。 IDs
的范围是1-5。我正在尝试使用ID == 1
删除所有行。此方法有效,但是df.index
仍显示所有1
值。
print(data.head(1))
print(data.index)
data.drop(1, inplace=True)
print(data.head(1))
print(data.index)
输出以下内容:
Inc Exp Inc_Label Exp_Label
ID Date
1 1993-12-31 0.064379 0.004731 0.083734 0.009975
1994-12-31 0.067377 0.009975 0.084116 0.015092
1995-12-31 0.067766 0.015092 0.087881 0.017213
MultiIndex(levels=[[1, 2, 3, 4, 5], ['1968-12-31', '1969-12-31', '1970-12-31', '1971-12-31', '1972-12-31', '1973-12-31', '1974-12-31', '1975-12-31', '1976-12-31', '1977-12-31', '1978-12-31', '1979-12-31', '1980-12-31', '1981-12-31', '1982-12-31', '1983-12-31', '1984-12-31', '1985-12-31', '1986-12-31', '1987-12-31', '1988-12-31', '1989-12-31', '1990-12-31', '1991-12-31', '1992-12-31', '1993-12-31', '1994-12-31', '1995-12-31', '1996-12-31', '1997-12-31', '1998-12-31', '1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31', '2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31', '2027-12-31', '2028-12-31', '2029-12-31', '2030-12-31', '2031-12-31', '2032-12-31', '2033-12-31', '2034-12-31', '2035-12-31', '2036-12-31', '2037-12-31', '2038-12-31', '2039-12-31', '2040-12-31', '2041-12-31', '2042-12-31', '2043-12-31', '2044-12-31', '2045-12-31', '2046-12-31', '2047-12-31', '2048-12-31', '2049-12-31', '2050-12-31', '2051-12-31', '2052-12-31', '2053-12-31', '2054-12-31', '2055-12-31', '2056-12-31', '2057-12-31', '2058-12-31', '2059-12-31', '2060-12-31', '2061-12-31', '2062-12-31', '2063-12-31', '2064-12-31', '2065-12-31', '2066-12-31', '2067-12-31', '2068-12-31', '2069-12-31', '2070-12-31', '2071-12-31', '2072-12-31', '2073-12-31', '2074-12-31', '2075-12-31', '2076-12-31', '2077-12-31', '2078-12-31', '2079-12-31']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81]],
names=['ID', 'Date'])
Inc Exp Inc_Label Exp_Label
ID Date
2 1973-12-31 0.056571 0.001702 0.073810 0.001010
1974-12-31 0.057276 0.001010 0.076057 0.000000
1975-12-31 0.059563 0.000000 0.077986 0.002915
MultiIndex(levels=[[1, 2, 3, 4, 5], ['1968-12-31', '1969-12-31', '1970-12-31', '1971-12-31', '1972-12-31', '1973-12-31', '1974-12-31', '1975-12-31', '1976-12-31', '1977-12-31', '1978-12-31', '1979-12-31', '1980-12-31', '1981-12-31', '1982-12-31', '1983-12-31', '1984-12-31', '1985-12-31', '1986-12-31', '1987-12-31', '1988-12-31', '1989-12-31', '1990-12-31', '1991-12-31', '1992-12-31', '1993-12-31', '1994-12-31', '1995-12-31', '1996-12-31', '1997-12-31', '1998-12-31', '1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31', '2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31', '2027-12-31', '2028-12-31', '2029-12-31', '2030-12-31', '2031-12-31', '2032-12-31', '2033-12-31', '2034-12-31', '2035-12-31', '2036-12-31', '2037-12-31', '2038-12-31', '2039-12-31', '2040-12-31', '2041-12-31', '2042-12-31', '2043-12-31', '2044-12-31', '2045-12-31', '2046-12-31', '2047-12-31', '2048-12-31', '2049-12-31', '2050-12-31', '2051-12-31', '2052-12-31', '2053-12-31', '2054-12-31', '2055-12-31', '2056-12-31', '2057-12-31', '2058-12-31', '2059-12-31', '2060-12-31', '2061-12-31', '2062-12-31', '2063-12-31', '2064-12-31', '2065-12-31', '2066-12-31', '2067-12-31', '2068-12-31', '2069-12-31', '2070-12-31', '2071-12-31', '2072-12-31', '2073-12-31', '2074-12-31', '2075-12-31', '2076-12-31', '2077-12-31', '2078-12-31', '2079-12-31']],
labels
names=['ID', 'Date'])
稍后,我尝试创建一个dict,其中每个键是ID
之一,每个值是其与原始df相对应的子df:
dict = {index: df.loc[index] for index in df.index.levels[0]}
抛出:
KeyError: 'the label [1] is not in the [index]'
我不知道发生什么变化,因为MultiIndex中的levels
在删除后保持不变,但是labels
不同。
答案 0 :(得分:0)
您需要从索引中删除未使用的级别,pandas
为此提供了一种方法:pandas.MultiIndex.remove_unused_levels
data.index = data.index.remove_unused_levels()
但是,如果您只想创建包含唯一组的字典,则实际上应该只使用groupby
:
dct = dict((id, gp) for id, gp in data.groupby(level=0))
也避免命名变量dict
,因为您将覆盖上面使用的默认dict
函数。
df1 = pd.DataFrame({'id1': [1,1,1,2,2],
'id2': list('ABCAB'),
'val': [11,12,13,14,15]})
df1 = df1.set_index(['id1', 'id2'])
df1.index
#MultiIndex(levels=[[1, 2], ['A', 'B', 'C']],
# labels=[[0, 0, 0, 1, 1], [0, 1, 2, 0, 1]],
# names=['id1', 'id2'])
df2 = df1.drop(1)
df2.index
#MultiIndex(levels=[[1, 2], ['A', 'B', 'C']],
# labels=[[1, 1], [0, 1]],
# names=['id1', 'id2'])
df2.index = df2.index.remove_unused_levels()
df2.index
#MultiIndex(levels=[[2], ['A', 'B']],
# labels=[[0, 0], [0, 1]],
# names=['id1', 'id2'])