我正试图找到一种方法来计算熊猫中的关系的累计总数。
让我们从田径比赛中获取假设数据,在那里我有人,种族,热情和时间。
每个人的位置都符合以下条件:
对于特定的种族/热量组合:
依旧......
这将是相当简单的代码,但有一点......
如果两个人有相同的时间,他们都会得到相同的位置,然后下一次大于他们的时间将有+ 1作为展示位置。
在下表中,对于100码短跑,加热1, RUNNER1 先完成, RUNNER2 / RUNNER3 完成第二, RUNNER3 完成第三次(下一次 RUNNER2 / RUNNER3 )
基本上,逻辑如下:
如果比赛<> race.shift()或heat<> heat.shift()然后放置= 1
如果race = race.shift()和heat = heat.shift()和time> time.shift则place = place.shift()+ 1
如果race = race.shift()和heat = heat.shift()和time> time.shift,则place = place.shift()
困扰我的部分是如何处理这种关系。否则我可以做类似
的事情df['Place']=np.where(
(df['race']==df['race'].shift())
&
(df['heat']==df['heat'].shift()),
df['Place'].shift()+1,
1)
谢谢!
示例数据如下:
Person,Race,Heat,Time
RUNNER1,100 Yard Dash,1,9.87
RUNNER2,100 Yard Dash,1,9.92
RUNNER3,100 Yard Dash,1,9.92
RUNNER4,100 Yard Dash,1,9.96
RUNNER5,100 Yard Dash,1,9.97
RUNNER6,100 Yard Dash,1,10.01
RUNNER7,100 Yard Dash,2,9.88
RUNNER8,100 Yard Dash,2,9.93
RUNNER9,100 Yard Dash,2,9.93
RUNNER10,100 Yard Dash,2,10.03
RUNNER11,100 Yard Dash,2,10.26
RUNNER7,200 Yard Dash,1,19.63
RUNNER8,200 Yard Dash,1,19.67
RUNNER9,200 Yard Dash,1,19.72
RUNNER10,200 Yard Dash,1,19.72
RUNNER11,200 Yard Dash,1,19.86
RUNNER12,200 Yard Dash,1,19.92
我最终想要的是
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4
[edit]现在,更进一步..
让我们假设一旦我留下一组唯一值,下次该设置出现时,值将重置为1 ..
所以,例如, - 请注意,它会进入"加热1"然后"加热2"然后回到"加热1" - 我不希望排名继续从之前的#1;加热1"而是我希望它们重置。
Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2
答案 0 :(得分:7)
您可以使用:
grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash', '200 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})
grouped = df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
产量
Heat Person Race Time Place Rank
0 1 RUNNER1 100 Yard Dash 9.87 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 4.0 5.0
11 1 RUNNER7 200 Yard Dash 19.63 1.0 1.0
12 1 RUNNER8 200 Yard Dash 19.67 2.0 2.0
13 1 RUNNER9 200 Yard Dash 19.72 3.0 3.0
14 1 RUNNER10 200 Yard Dash 19.72 3.0 3.0
15 1 RUNNER11 200 Yard Dash 19.86 4.0 5.0
16 1 RUNNER12 200 Yard Dash 19.92 5.0 6.0
请注意,Pandas有一个Groupby.rank
方法可以计算许多常见的排名形式 - 但不是你描述的那种。请注意,例如在第3行,Rank
在第二个和第三个参赛者之间的比赛结束后是4,而Place
是3。
关于编辑:使用
(df['Heat'] != df['Heat'].shift()).cumsum()
消除歧义:
import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})
df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped = df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)
产量
Heat Person Race Time HeatGroup Place Rank
0 1 RUNNER1 100 Yard Dash 9.87 1 1.0 1.0
1 1 RUNNER2 100 Yard Dash 9.92 1 2.0 2.0
2 1 RUNNER3 100 Yard Dash 9.92 1 2.0 2.0
3 1 RUNNER4 100 Yard Dash 9.96 1 3.0 4.0
4 1 RUNNER5 100 Yard Dash 9.97 1 4.0 5.0
5 1 RUNNER6 100 Yard Dash 10.01 1 5.0 6.0
6 2 RUNNER7 100 Yard Dash 9.88 2 1.0 1.0
7 2 RUNNER8 100 Yard Dash 9.93 2 2.0 2.0
8 2 RUNNER9 100 Yard Dash 9.93 2 2.0 2.0
9 2 RUNNER10 100 Yard Dash 10.03 2 3.0 4.0
10 2 RUNNER11 100 Yard Dash 10.26 2 4.0 5.0
11 1 RUNNER7 100 Yard Dash 19.63 3 1.0 1.0
12 1 RUNNER8 100 Yard Dash 19.67 3 2.0 2.0
13 1 RUNNER9 100 Yard Dash 19.72 3 3.0 3.0
14 1 RUNNER10 100 Yard Dash 19.72 3 3.0 3.0
15 1 RUNNER11 100 Yard Dash 19.86 3 4.0 5.0
16 1 RUNNER12 100 Yard Dash 19.92 3 5.0 6.0