我有一个多级索引框架。它看起来像
actor title_year sum count
50 Cent 2005.0 30981850.0 1
A.J. Buckley 2015.0 123070338.0 1
Aaliyah 2002.0 30307804.0 1
Aasif Mandvi 2008.0 13214030.0 1
Abbie Cornish 2009.0 4440055.0 1
此处,actor
和title_year
构成一个多索引。如何切出具有actor
跨度超过n
年的multi_index的条目?
答案 0 :(得分:0)
我认为你需要filter
:
print (df)
sum count
actor title_year
50 Cent 2005.0 30981850.0 1
2006.0 30981850.0 1
2007.0 30981850.0 1
A.J. Buckley 2015.0 123070338.0 1
Aaliyah 2002.0 30307804.0 1
2002.0 30307804.0 1
2004.0 30307804.0 1
Aasif Mandvi 2008.0 13214030.0 1
Abbie Cornish 2009.0 4440055.0 1
如果需要删除actor
以length
作为2
的所有df1 = df.groupby(level='actor').filter(lambda x: len(x) < 3)
print (df1)
sum count
actor title_year
A.J. Buckley 2015.0 123070338.0 1
Aasif Mandvi 2008.0 13214030.0 1
Abbie Cornish 2009.0 4440055.0 1
:
actor
如果需要删除length
级title_year
中2
个df2 = df.groupby(level='actor')
.filter(lambda x: x.index.get_level_values('title_year').nunique() < 3)
print (df2)
sum count
actor title_year
A.J. Buckley 2015.0 123070338.0 1
Aaliyah 2002.0 30307804.0 1
2002.0 30307804.0 1
2004.0 30307804.0 1
Aasif Mandvi 2008.0 13214030.0 1
Abbie Cornish 2009.0 4440055.0 1
个{{1}}个{{1}}的所有{{1}}:
{{1}}