我有一个巨大的 pandas
DataFrame df
,按 id
和 year
排序:
id gender year
3 male 1983
3 male 1983
3 male 1985
3 male 1990
6 female 1991
6 female 1992
7 male 1980
...
592873 female 1989
592873 female 1996
593001 male 2001
593428 female 2007
593428 female 2009
我的目标是创建另一个列 ca
,其计算方式为:
year
- 最少 year
的那个 id
因此,df
的前六行应该返回:
id gender year ca
3 male 1983 0
3 male 1983 0
3 male 1985 2
3 male 1990 7
6 female 1991 0
6 female 1992 1
(换句话说,我正在寻找对 this question 的 Pythonic 答案。)
我能想到的一个解决方案是制作一个列表并使用 for
循环:
ca_list = []
for i in range(len(df)):
if df['id'][i] != df['id'][i-1]:
num = df['year'][i]
ca_list.append(0)
else:
ca_list.append(df['year'][i] - num)
df['ca'] = ca_list
但我相信有一种更优化的方法来设计这个。非常感谢任何见解。
答案 0 :(得分:1)
试试:
df["ca"] = df.groupby("id")["year"].transform(lambda x: x - x.min())
print(df)
打印:
id gender year ca
0 3 male 1983 0
1 3 male 1983 0
2 3 male 1985 2
3 3 male 1990 7
4 6 female 1991 0
5 6 female 1992 1