如何根据Pandas中两列的时差创建新列?

时间:2015-11-29 16:06:20

标签: python-2.7 pandas dataframe

我有一个Time_x和Time_y的数据框,格式为:

# 2015-10-01 23:59:59.997
%Y-%m-%d %H:%M:%S.%f

我做不到:

df['TimeDiff'] = datetime.strptime(df['Time_x'], '%Y-%m-%d %H:%M:%S.%f') - \
                 datetime.strptime(df['Time_y'], '%Y-%m-%d %H:%M:%S.%f')

我不能这样做是为了回报差异:

# Defining a function to call with Pandas to apply()
def time_difference(a):
    Time_x, Time_y = a
    c = datetime.strptime(Time_x, '%Y-%m-%d %H:%M:%S.%f') - datetime.strptime(Time_y, '%Y-%m-%d %H:%M:%S.%f')

    if c.days < 1:
        if c.minute <= 15:
            return c.minute
        else:
            return c.days
    else:
        None

# Creating a new column using my function.
# Error: “Too many values to unpack” Exception
df['TimeDiff'] = df[['Time_x', 'Time_y']].apply(time_difference)

那么,我怎样才能做到这一点?

1 个答案:

答案 0 :(得分:1)

IIUC,您正在从csv文件中读取数据:

time_x,time_y
2015-10-01 23:59:59.997,2015-10-01 23:58:59.997
2015-10-01 23:57:59.997,2015-10-01 23:59:59.997

我会阅读并解析日期:

df = pd.read_csv('yourfile.csv', parse_dates=['time_x','time_y'])

所以你以后可以申请:

df['TimeDiff'] = (df['time_x'] - df['time_y']).dt.seconds

返回:

                   time_x                  time_y  TimeDiff
0 2015-10-01 23:59:59.997 2015-10-01 23:58:59.997        60
1 2015-10-01 23:57:59.997 2015-10-01 23:59:59.997     86280

通过这种方式,您可以指定所需的时间单位(dt.hourdt.minute等)。