我有一张下表。第一列是年份,第二列是路面处理类型,第三列是路面得分。我需要通过从当前分数的年份中减去最后一次处理的年份来创建第三列,称为“ year diff”。例如,2014年需要减去2013,因为处理9是在2013年完成的,结果1必须记录在相应单元格的col ['year diff']中。由于处理10是在2020年完成的,因此2022年需要减去2020。
非常感谢大家的帮助。
真诚的
威尔逊
答案 0 :(得分:2)
使用:
#check not missing values
m = df['treatment'].notnull()
#create groups starting not missing values
s = m.cumsum()
#add missing values for first group and for not missing values
mask = (s == 0) | m
#subtract score with first score per group
out = df['score'] - df['score'].groupby(s).transform('first')
#add missing values
df['year diff'] = np.where(mask, np.nan, out)
print (df)
year treatment score year diff
0 2010 NaN 1 NaN
1 2011 NaN 2 NaN
2 2012 NaN 3 NaN
3 2013 9.0 4 NaN
4 2014 NaN 5 1.0
5 2015 NaN 6 2.0
6 2016 NaN 7 3.0
7 2017 NaN 8 4.0
8 2018 NaN 9 5.0
9 2019 NaN 10 6.0
10 2020 10.0 11 NaN
11 2021 NaN 12 1.0
12 2022 NaN 13 2.0
13 2023 NaN 14 3.0
14 2024 NaN 15 4.0
15 2025 12.0 16 NaN
16 2026 NaN 17 1.0
17 2027 NaN 18 2.0
答案 1 :(得分:1)
IIUC,您可以使用:
df['identifier']=(df['year'].diff().eq(1)&df['treatment'].notnull()).cumsum()
df['year diff ']=df.groupby('identifier')['identifier'].apply\
(lambda x: pd.Series(np.where(x!=0,pd.Series(pd.factorize(x)[0]+1).cumsum().shift(),np.nan))).values
print(df)
或者如果您需要根据治疗值考虑分数差异:
df['identifier']=(df['year'].diff().eq(1) &df['treatment'].notnull()).cumsum()
df['year diff']=df.groupby('identifier')['score']\
.apply(lambda x : pd.Series(np.where(x!=0,x.diff().expanding().sum(),np.nan))).reset_index(drop=True)
df.loc[df['identifier']==0,'year diff']=np.nan
print(df)
year treatment score identifier year diff
0 2010 NaN 1 0 NaN
1 2011 NaN 2 0 NaN
2 2012 NaN 3 0 NaN
3 2013 9.0 4 1 NaN
4 2014 NaN 5 1 1.0
5 2015 NaN 6 1 2.0
6 2016 NaN 7 1 3.0
7 2017 NaN 8 1 4.0
8 2018 NaN 9 1 5.0
9 2019 NaN 10 1 6.0
10 2020 10.0 11 2 NaN
11 2021 NaN 12 2 1.0
12 2022 NaN 13 2 2.0
13 2023 NaN 14 2 3.0
14 2024 NaN 15 2 4.0
15 2025 12.0 16 3 NaN
16 2026 NaN 17 3 1.0
17 2027 NaN 18 3 2.0
答案 2 :(得分:1)
如果您想使用for
循环来完成此操作:
df = pd.DataFrame(mydata)
mylist = df.index[df['treatment'] != ''].tolist()
现在我们减去year
值
re_list= []
for index,row in df.iterrows():
if index > min(mylist):
m = [i for i in mylist if i <= index]
re_list.append(df.iloc[index]['year'] - df.iloc[max(m)]['year'])
else:
re_list.append(0)
df['Result'] = re_list