azure ml

时间:2018-05-04 01:32:30

标签: python pandas azure

我试图在天数方面找出两个日期之间的差异。我正在尝试以下代码

d1=pd.to_datetime(dataframe1['Order Date'])
d=str(d1)
dates=datetime.strptime(d,'%m-%d-%Y')
d2=pd.to_datetime(dataframe1['Dispatch Date'])
dd=str(d2)
dates1=datetime.strptime(dd,'%m-%d-%Y')
dataframe1['Months_difference']=dates1-dates

但它正在显示如下错误:

  

ValueError:时间数据'0 2017-02-13 \ n1 2017-02-24 \ n2 2017-03-02 \ n3 2017-03-06 \ n4 2017-03-06 \ n5 2017-03-06 \ n6 2017-03-11 \ n7 2017-03-23 \ n8 2017-03-23 \ n9 2017-03-24 \ n10 2017-04-07 \ n11 2017-04-07 \ n12 2017-04-07 \ n13 2017-04-07 \ n14 2017-04-07 \ n ... \ n855 2018-02-02 \ n856 2018-02-02 \ n857 2018-02-02 \ n858 2018-02-02 \ n859 2018-02 -02 \ n860 2018-02-01 \ n861 2018-02-06 \ n862 2018-03-15 \ n863 2018-03-21 \ n864 2018-03-21 \ n865 2018-04-05 \ n866 2018-04- \ n \ n名称:订单日期,长度:870,dtype:datetime64 [ns]'与格式'%m-%d-不匹配%Y”   返回的进程返回非零退出代码1   如何解决这个问题

1 个答案:

答案 0 :(得分:0)

IIUC,您可以在pandas内完成所有操作,而无需使用datetime模块。我假设您的起始数据框看起来像:

>>> dataframe1
  Dispatch Date  Order Date
0    2017-03-02  2017-02-13
1    2017-03-06  2017-02-24

在这种情况下,你可以这样做:

# set columns to datetime:
dataframe1['Order Date'] = pd.to_datetime(dataframe1['Order Date'])
dataframe1['Dispatch Date'] = pd.to_datetime(dataframe1['Dispatch Date'])
# Make a new column for the difference in days
dataframe1['day_diff'] =  dataframe1['Dispatch Date'] - dataframe1['Order Date']

哪个输出:

>>> df
  Dispatch Date Order Date day_diff
0    2017-03-02 2017-02-13  17 days
1    2017-03-06 2017-02-24  10 days

解释:在pandas中减去两个日期时间对象会产生Timedelta对象(如新列day_diff中所示)。如果您希望它作为天数的整数表示,只需将dt.days添加到最后一个命令:

dataframe1['day_diff'] =  (dataframe1['Dispatch Date'] - dataframe1['Order Date']).dt.days