我有一个数据框,如下所示:
RefNo TopicNo BillA/c PremisesNo Date Age TopicType
1 111 1234 54698 11/12/18 APSR
2 222 5698 123654 12/12/18 KLPO
我需要查找所有PremisesNo的出现,并找出各自的日期差异。
RefNo TopicNo BillA/c PremisesNo Date Age TopicType Diff
1 111 1234 54698 11/12/18 APSR 1
2 222 5698 54698 12/12/18 KLPO 0
3 333 5798 54698 12/12/18 KLPO NA
我尝试了以下代码:
df2 =[]
def occurence(df1):
for ind, row in df2.iterrows():
if ind in df['Premises Number'].unique():
df2.append(df1['Premises Number'])
return df2
occurence(df1)
但是它没有提供所需的解决方案。需要一些建议。
答案 0 :(得分:0)
您可以按PremisesNo
分组,并使用DateAge.diff
列的diff
:
df['Diff'] = df.groupby('PremisesNo').['Date Age'].diff(-1).abs().dt.days
使用示例数据框:
TopicNo BillA/c PremisesNo Date Age TopicType
RefNo
1 111 1234 54698 2018-12-11 APSR
2 222 5698 54698 2018-12-12 KLPO
3 333 5798 54698 2018-12-12 KLPO
首先将Date Age
列设置为日期时间,然后执行上述操作:
df['Date Age'] = pd.to_datetime(df['Date Age'], format = '%d/%m/%y')
df['Diff'] = df.groupby('PremisesNo')['Date Age'].diff(-1).abs().dt.days
TopicNo BillA/c PremisesNo Date Age TopicType Diff
RefNo
1 111 1234 54698 2018-12-11 APSR 1.0
2 222 5698 54698 2018-12-12 KLPO 0.0
3 333 5798 54698 2018-12-12 KLPO NaN
答案 1 :(得分:0)
要添加到@nixon答案中,请尝试
将“日期年龄”转换为熊猫DateTime
df['Date Age'] = pd.to_datetime(df['Date Age'])
df['Diff'] = df[['PremisesNo','Date Age']].groupby('PremisesNo')['Date Age'].diff()
当前提没有变化时,则使差异无
df.loc[df.PremisesNo != df.PremisesNo.shift(),'Diff'] = None