我有一个类似的数据框:
City %SC Team
0 London 50.5 A
1 London 40.1 B
2 London 9.4 C
3 Birmingham 31.3 B
4 Birmingham 27.1 A
5 Birmingham 23.7 D
6 Birmingham 17.9 C
7 York 40.1 A
8 York 38.8 C
9 York 21.1 B
.
.
.
我想根据前2名球队的不同来区分Clear赢,Marginal赢,Extremely Marginal赢的城市。 我尝试了以下代码:
df = pd.read_csv('file.csv')
Clear, Marginal, EMarginal = [],[],[]
for i in file['%SC']:
if i[0] - i[1] >= 10:
Clear.append('City','Team')
elif i[0] - i[1] < 10 and i[0] - i[1] >=2 :
Marginal.append('City','Team')
else:
EMarginal.append('City','Team')
预期输出:
Clear = [London , A]
Marginal = [Birmingham , B]
EMarginal = [York , A]
我的方法似乎不正确,有人可以建议我达到预期效果的方法吗?非常感谢
答案 0 :(得分:1)
如果我理解正确,您希望根据前两个团队将城市分为几类。
def classify(city):
largest = city.nlargest(2, '%SC')
diff = largest['%SC'].iloc[0] - largest['%SC'].iloc[1]
if diff >= 10:
return 'Clear'
elif diff < 10 and diff >=2 :
return 'Marginal'
return 'EMarginal'
groups = df.groupby("City").apply(classify)
# groups is the following table:
# City
# Birmingham Marginal
# London Clear
# York EMarginal
# dtype: object
如果您坚持将它们作为列表,则可以致电
groups.groupby(groups).apply(lambda g: list(g.index)).to_dict()
# Output:
# {'Clear': ['London'], 'EMarginal': ['York'], 'Marginal': ['Birmingham']}
如果您仍然坚持在每个城市都包括获胜团队,可以致电
groups.name = "Margin"
df.join(groups, on="City")\
.groupby("Margin")\
.apply(
lambda g: list(
g.nlargest(1, "%SC")
.apply(lambda row: (row["City"], row["Team"]), axis=1)
)
).to_dict()
# Output
# {'Clear': [('London', 'A')], 'EMarginal': [('York', 'A')], 'Marginal': [('Birmingham', 'B')]}