如何使用pandas我必须拆分Date列并找出延迟的天数

时间:2017-11-03 06:37:15

标签: python pandas

以下是代码:

单元格1:

%matplotlib notebook

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

purchase_data = pd.read_csv('Lokad_PurchaseOrders.csv')

purchase_data

Cell 2:

import datetime
import math
from datetime import date

start_date = purchase_data.Date
end_date = purchase_data.DeliveryDate
def d(s):
  [day, month, year] = map(int, s.split('-'))
  return date(day, month, year)

def delay(end, start):
  return (d(end) - d(start)).days  

delay(end_date, start_date)

以下是错误消息:

AttributeError    
                        Traceback (most recent call last)

<ipython-input-19-ce1db3e6387fin <module>()

     13 

     14 

---15 delay(end_date, start_date)


<ipython-input-19-ce1db3e6387fin delay(end, start)

     10 

     11 def delay(end, start):

---12     return (d(end) - d(start)).days

     13 

     14 

<ipython-input-19-ce1db3e6387fin d(s)

      6 end_date = purchase_data.DeliveryDate

      7 def d(s):

----8     [day, month, year] = map(int, s.split('-'))

      9     return date(day, month, year)

     10 

F:\Anaconda3\lib\site-packages\pandas\core\generic.py in
__getattr__(self, name)

   2968             if name in self._info_axis:

   2969                 return self[name]

-2970             return object.__getattribute__(self, name)

   2971 

   2972     def __setattr__(self, name, value):


AttributeError: 'Series' object has no attribute 'split'

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

首先,您应该确保end_datestart_date都是字符串而不是系列对象,如错误所示。您可以使用type()功能进行检查。

要拆分系列类型数据,您可以参考this answer

其次,我认为在这个功能中:

def delay(end, start):
  return (d(end) - d(start)).days  

(d(end) - d(start))可能没有days属性。

愿这有帮助。