数据框列中的计算

时间:2019-11-18 23:09:37

标签: python pandas

我有一个类似的数据框:

  City       %SC    Team
0 London     50.5   A
1 London     40.1   B
2 London     9.4    C
3 Birmingham 31.3   B
4 Birmingham 27.1   A
5 Birmingham 23.7   D
6 Birmingham 17.9   C
7 York       40.1   A
8 York       38.8   C
9 York       21.1   B
.
.
.

我想根据前2名球队的不同来区分Clear赢,Marginal赢,Extremely Marginal赢的城市。 我尝试了以下代码:

df = pd.read_csv('file.csv')
Clear, Marginal, EMarginal = [],[],[]
for i in file['%SC']:
    if i[0] - i[1] >= 10:
    Clear.append('City','Team')
    elif i[0] - i[1] < 10 and i[0] - i[1] >=2 :
    Marginal.append('City','Team')
    else:
    EMarginal.append('City','Team')

预期输出:

Clear = [London , A]
Marginal = [Birmingham , B]
EMarginal = [York , A]

我的方法似乎不正确,有人可以建议我达到预期效果的方法吗?非常感谢

1 个答案:

答案 0 :(得分:1)

如果我理解正确,您希望根据前两个团队将城市分为几类。

def classify(city):
    largest = city.nlargest(2, '%SC')
    diff = largest['%SC'].iloc[0] - largest['%SC'].iloc[1]
    if diff >= 10:
        return 'Clear'
    elif diff < 10 and diff >=2 :
        return 'Marginal'
    return 'EMarginal'

groups = df.groupby("City").apply(classify)

# groups is the following table:
# City
# Birmingham     Marginal
# London            Clear
# York          EMarginal
# dtype: object

如果您坚持将它们作为列表,则可以致电

groups.groupby(groups).apply(lambda g: list(g.index)).to_dict()
# Output:
# {'Clear': ['London'], 'EMarginal': ['York'], 'Marginal': ['Birmingham']}

如果您仍然坚持在每个城市都包括获胜团队,可以致电

groups.name = "Margin"
df.join(groups, on="City")\
  .groupby("Margin")\
  .apply(
      lambda g: list(
          g.nlargest(1, "%SC")
           .apply(lambda row: (row["City"], row["Team"]), axis=1)
      )
  ).to_dict()
# Output
# {'Clear': [('London', 'A')], 'EMarginal': [('York', 'A')], 'Marginal': [('Birmingham', 'B')]}