我正在尝试寻找一个时间,一个ID在不同的州度过。给出了每个状态的结束时间。第二个结束时间与第一个结束时间之间的差给出了第二个状态所花费的时间。同一状态可能在id进程中多次出现。
以下代码解析csv输入并提供区别。我想将计算出的时间差分配给每个状态(如果相同状态重复,则增加总和)。对于多个ID重复此操作。我正在考虑使用嵌套字典,其中外键是id,内键:值是状态:总时间差。但是,我不确定逻辑。
import pandas as pd
import numpy as np
import datetime as datetime
fileName = "Input_Data.csv"
df = pd.read_csv(fileName, delimiter = ',')
df2 = pd.to_datetime(df.end_time)
id= df['id'].loc[0]# get first id
i = 1
while (i < df.shape[0]):
if (id == df['id'].loc[i]):
diff = df2.loc[i] - df2.loc[i-1]
df['timediff'].loc[i] = diff
print ('id', id, 'status',df['status'].loc[i], 'time diff', diff)
else :
prid = df['id'].loc[i]
i += 1
#Nested Dictionary
#uniqueid = df['id'].unique()
#status = ["Returned","Draft","Pending Review","Submitted","PR Placed"]
#dict{}