我是python的初学者,因此我的问题可能会变得很简单。感谢您的支持或导致我遇到问题的任何线索。
问题:
大约有10个不同的州;订单跨不同的状态移动,状态结束时会生成时间戳。例如,下面有四个状态A,B,C,D。
A 10 AM
B 1 PM
C 4 Pm
D 5 PM
Time spent in B = 1PM -10AM = 3.
也请检查一下。我如何将输入和输出设置为-请在此处检查此文件“ Sampleinputoutput” -drive.google.com/open?id=15lBHI-TA0zLWfYcjb54hJkxjn0xv-XWx;
有时候,同一状态可能会发生多次。因此,我们需要一个变量来存储单个状态的时差值
到目前为止,在原始数据csv和我的代码下面。需要执行多个订单。但是,为简单起见,我现在仅提供一个订单的数据。
样本数据:
Order States modified_at
1 Resolved 2018-06-18T15:05:52.2460000
1 Edited 2018-05-24T21:44:07.9030000
Pending PO Creation 2018-06-06T19:52:51.5990000
1 Assigned 2018-05-24T17:46:03.2090000
1 Edited 2018-06-04T15:02:57.5130000
1 Draft 2018-05-24T17:45:07.9960000
1 PO Placed 2018-06-06T20:49:37.6540000
1 Edited 2018-06-04T11:18:13.9830000
1 Edited 2018-05-24T17:45:39.4680000
1 Pending Approval 2018-05-24T21:48:23.9180000
1 Edited 2018-06-06T21:00:19.6350000
1 Submitted 2018-05-24T21:44:37.8830000
1 Edited 2018-05-30T11:19:36.5460000
1 Edited 2018-05-25T11:16:07.9690000
1 Edited 2018-05-24T21:43:35.0770000
1 Assigned 2018-06-07T18:39:00.2580000
1 Pending Review 2018-05-24T17:45:10.5980000
1 Pending PO Submission 2018-06-06T14:16:26.6580000
我尝试的代码:
import pandas as pd
import datetime as datetime
from dateutil.relativedelta import relativedelta
fileName = "SamplePR.csv"
df = pd.read_csv(fileName, delimiter=',')
df['modified_at'] = pd.to_datetime(df.modified_at)
df = df.sort_values(by='modified_at')
df = df.reset_index(drop=True)
df1 = df[:-1]
df2 = df[1:]
dfm1 = df1['modified_at']
dfm2 = df2['modified_at']
dfm1 = dfm1.reset_index(drop=True)
dfm2 = dfm2.reset_index(drop=True)
for i in range(len(df)-1):
start = datetime.datetime.strptime(str(dfm1[i]), '%Y-%m-%d %H:%M:%S')
ends = datetime.datetime.strptime(str(dfm2[i]), '%Y-%m-%d %H:%M:%S')
diff = relativedelta(ends, start)
print (diff)
到目前为止,我尝试按时间对列表进行排序,然后计算2种状态之间的差异。
如果有人可以提供逻辑帮助或指出正确的方向,我将不胜感激。