基于另一列减去某个熊猫数据框列的最小值

时间:2021-07-06 00:53:21

标签: python pandas dataframe performance

我有一个巨大的 pandas DataFrame df,按 idyear 排序:

id        gender        year
3         male          1983
3         male          1983
3         male          1985
3         male          1990
6         female        1991
6         female        1992
7         male          1980
...
592873    female        1989
592873    female        1996
593001    male          2001
593428    female        2007
593428    female        2009

我的目标是创建另一个列 ca,其计算方式为:

  • year - 最少 year 的那个 id

因此,df 的前六行应该返回:

id        gender        year        ca
3         male          1983        0
3         male          1983        0
3         male          1985        2
3         male          1990        7
6         female        1991        0
6         female        1992        1

(换句话说,我正在寻找对 this question 的 Pythonic 答案。)


我能想到的一个解决方案是制作一个列表并使用 for 循环:

ca_list = []

for i in range(len(df)):
  if df['id'][i] != df['id'][i-1]:
    num = df['year'][i]
    ca_list.append(0)
  else:
    ca_list.append(df['year'][i] - num)

df['ca'] = ca_list

但我相信有一种更优化的方法来设计这个。非常感谢任何见解。

1 个答案:

答案 0 :(得分:1)

试试:

df["ca"] = df.groupby("id")["year"].transform(lambda x: x - x.min())
print(df)

打印:

   id  gender  year  ca
0   3    male  1983   0
1   3    male  1983   0
2   3    male  1985   2
3   3    male  1990   7
4   6  female  1991   0
5   6  female  1992   1