我想计算熊猫数据库中项目的最高得分-最低得分
当前df如下所示:
projectID supplierID score
1 1 50
1 2 60
1 3 75
我希望它看起来像这样:
max-min => 75-50 = 25
projectID supplierID score max-min
1 1 50 25
1 2 60 25
1 3 75 25
我想对每个projectID进行此操作
答案 0 :(得分:4)
使用np.ptp
(“峰到峰”):
df['max-min']=df.groupby('projectID').score.transform(np.ptp)
#df.groupby('projectID').score.transform(np.ptp)
Out[229]:
0 25
1 25
2 25
Name: score, dtype: int64
答案 1 :(得分:3)
您可以使用transform
通过传回将max
和min
减去的lambda函数来广播结果。
df['max-min'] = df.groupby('projectID').score.transform(lambda s: s.max() - s.min())
projectID supplierID score max-min
0 1 1 50 25
1 1 2 60 25
2 1 3 75 25
答案 2 :(得分:2)
您可以使用groupby
获取最大值和最小值,然后join
将“ projectId”上的结果作为新列:
import pandas as pd
df = pd.DataFrame([[1, 1, 30],
[1, 2, 50],
[2, 1, 60],
[2, 2, 40],
[1, 3, 20]],
columns=["projectID", "supplierID", "score"])
df.join( df.groupby(["projectID"])["score"].max()
- df.groupby(["projectID"])["score"].min(),
on="projectID", rsuffix="_max-min")
答案 3 :(得分:1)
您可以使用Declare @YourTable Table ([advisor_rep_id] varchar(50),[individual_url_rep_codes] varchar(50),[split_url_rep_codes] varchar(50))
Insert Into @YourTable Values
(57444,'9289','9569')
,(4407,'397','7128, 9226')
,(52779,'8613, 8614, 8616',NULL)
,(56732,NULL,'9193, 4423')
,(56713,'3456','9193, 4423')
Select A.[advisor_rep_id]
,URLS = B.RetVal
from @YourTable A
Cross Apply (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(concat([individual_url_rep_codes],',',[split_url_rep_codes]),',','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) B
Where B.RetVal is not null
+ advisor_rep_id URLS
57444 9289
57444 9569
4407 397
4407 7128
4407 9226
52779 8613
52779 8614
52779 8616
56732 9193
56732 4423
56713 3456
56713 9193
56713 4423
:
GroupBy
transform
方法将常规g = df.groupby('projectID')['score']
df['max-min'] = g.transform('max') - g.transform('min')
的结果与石斑鱼系列进行对齐。