我有一个数据框:
import pandas as pd
tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['This','That','SomeName','This','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index, columns=['Whatev'])
df
Out[3]:
Whatev
Type Year
a 1990 This
1994 That
1996 SomeName
b 1992 This
1997 SomeOtherName
c 2001 SomeThirdName
我想添加一个对应于'Year'的升序整数列,它们会为每个'Type'重置,如下所示:
Whatev IndexInt
Type Year
a 1990 This 1
1994 That 2
1996 SomeName 3
b 1992 This 1
1997 SomeOtherName 2
c 2001 SomeThirdName 1
这是我目前的方法:
grouped = df.groupby(level=0)
unique_loc = []
for name, group in grouped:
unique_loc += range(1,len(group)+1)
joined['IndexInt'] = unique_loc
但这对我来说似乎很难看,而且我想它在我正在使用的约5000万行数据帧上会变慢。有更简单的方法吗?
答案 0 :(得分:2)
您可以使用groupby(level=0)
+ cumcount():
In [7]: df['IndexInt'] = df.groupby(level=0).cumcount()+1
In [8]: df
Out[8]:
Whatev IndexInt
Type Year
a 1990 This 1
1994 That 2
1996 SomeName 3
b 1992 This 1
1997 SomeOtherName 2
c 2001 SomeThirdName 1