我是python的新手 我有以下df
ClientID DOB LostDate Category ReportedDate
APJ5L9C 1975 3/13/2017 Ungrouped 3/23/2017
APJ5L9C 1993 7/25/2014 Ungrouped 3/5/2017
BKL1N9C 1981 3/22/2017 Ungrouped 3/29/2017
BKL1N9C 1981 1/31/2017 Ungrouped 3/31/2017
BMO3K9C 1982 3/15/2017 Ungrouped 3/27/2017
BOM1N9C 1981 3/16/2017 Ungrouped 3/27/2017
K9E6JSC 2000 3/15/2017 Ungrouped 4/3/2017
K9E6JSC 1994 1/14/2017 Ungrouped 3/24/2017
M12L0A93 1986 3/16/2017 Ungrouped 3/23/2017
M12L0A93 1981 1/17/2017 Ungrouped 3/29/2017
M12L0A94 1981 3/17/2017 Ungrouped 3/29/2017
MCI6A92 1993 3/24/2017 Ungrouped 3/24/2017
N9E4HSC 2000 3/30/2017 Ungrouped 4/3/2017
以下代码运行良好,但我无法将其放入循环中,以便Cat用增量ID(基本上是带_1,_2等的Client ID的串联)编写的。理想的结果是,如果任何组中LostDate和ReporteDate之间的第一个差异记录为ClientID_1,则已经分类的组中LostDate和ReporteDate之间的任何后续差异将递增到下一个未使用的ID。假设我们有ID_2,它将转到ID_3,如果ID_5是最后一个,它将转到ID_6,等等
#Finding the earliest lost date reported in a group
mask = df['Category'] == 'Ungrouped'
df.loc[mask, 'LostDatef'] = df.loc[mask].groupby(['ClientID', 'DOB'])['LostDate'].transform(lambda x:x.min())
df['TimeDiffinDAYS'] = (df['ReportedDate']-df['LostDatef']).dt.days
#Iterate and group INCREMENTALLY DEFINING ClientID
for row in df['TimeDiffinDAYS']:
if row <=7:
#def assessmentsort(kala):
df.loc['Category'] = df ['GHJY'].apply(lambda x: '{}'"_1".format(x))
else:
df.loc[df.TimeDiffinDAYS > 50, 'Category'] = df ['GHJY'].apply(lambda x: '{}'.format('Ugrouped'))
print df
我想要的结果:
ClientID DOB LostDate Category ReportedDate
APJ5L9C 1975 3/13/2017 APJ5L9C_1 3/23/2017
APJ5L9C 1993 7/25/2014 APJ5L9C_2 3/5/2017
BKL1N9C 1981 3/22/2017 BKL1N9C-1 3/29/2017
BKL1N9C 1981 1/31/2017 BKL1N9C-2 3/31/2017
BMO3K9C 1982 3/15/2017 BMO3K9C_1 3/27/2017
BOM1N9C 1981 3/16/2017 BOM1N9C_1 3/27/2017
K9E6JSC 2000 3/15/2017 K9E6JSC_1 4/3/2017
K9E6JSC 1994 1/14/2017 K9E6JSC_2 3/24/2017
M12L0A93 1986 3/16/2017 M12L0A93_1 3/23/2017
M12L0A93 1981 1/17/2017 M12L0A93_2 3/29/2017
M12L0A94 1981 3/17/2017 M12L0A94_1 3/29/2017
MCI6A92 1993 3/24/2017 MCI6A92_1 3/24/2017
N9E4HSC 2000 3/30/2017 N9E4HSC_1 4/3/2017
这可能吗?
答案 0 :(得分:0)
您可以GroupBy
ClientID
并使用cumcount
,然后使用ClientID
将此值连接到str.cat
:
g = (df.groupby('ClientID').cumcount() + 1)
df['Category'] = df.ClientID.str.cat('_' + g.astype(str))
ClientID DOB LostDate Category ReportedDate
0 APJ5L9C 1975 3/13/2017 APJ5L9C_1 3/23/2017
1 APJ5L9C 1993 7/25/2014 APJ5L9C_2 3/5/2017
2 BKL1N9C 1981 3/22/2017 BKL1N9C_1 3/29/2017
3 BKL1N9C 1981 1/31/2017 BKL1N9C_2 3/31/2017
4 BMO3K9C 1982 3/15/2017 BMO3K9C_1 3/27/2017
5 BOM1N9C 1981 3/16/2017 BOM1N9C_1 3/27/2017
6 K9E6JSC 2000 3/15/2017 K9E6JSC_1 4/3/2017
7 K9E6JSC 1994 1/14/2017 K9E6JSC_2 3/24/2017
8 M12L0A93 1986 3/16/2017 M12L0A93_1 3/23/2017
9 M12L0A93 1981 1/17/2017 M12L0A93_2 3/29/2017
10 M12L0A94 1981 3/17/2017 M12L0A94_1 3/29/2017
11 MCI6A92 1993 3/24/2017 MCI6A92_1 3/24/2017
12 N9E4HSC 2000 3/30/2017 N9E4HSC_1 4/3/2017