我有一个Pandas
dataframe
来存储人们的旅行日期。我想添加一个显示停留时间长度的列。为此,需要解析string
,转换为datetime
并减去。 Pandas
似乎将datetime
转换视为整个系列,而不是将strings
视为我TypeError: must be string, not Series
。我喜欢用非循环选项来做这个,因为实际数据集非常大,但需要一点帮助。
import pandas as pd
from datetime import datetime
df = pd.DataFrame(data=[['Bob', '12 Mar 2015 - 31 Mar 2015'], ['Jessica', '27 Mar 2015 - 31 Mar 2015']], columns=['Names', 'Day of Visit'])
df['Length of Stay'] = (datetime.strptime(df['Day of Visit'][:11], '%d %b %Y') - datetime.strptime(df['Day of Visit'][-11:], '%d %b %Y')).days + 1
print df
期望的输出:
Names Day of Visit Length of Stay
0 Bob 12 Mar 2015 - 31 Mar 2015 20
1 Jessica 27 Mar 2015 - 31 Mar 2015 5
答案 0 :(得分:4)
使用Series.str.extract
将Day of Visit
列拆分为两个单独的列。
然后使用pd.to_datetime
将列解析为日期。
然后可以通过减去日期列并添加1:
Length of Stay
import numpy as np
import pandas as pd
df = pd.DataFrame(data=[['Bob', '12 Mar 2015 - 31 Mar 2015'], ['Jessica', '27 Mar 2015 - 31 Mar 2015']], columns=['Names', 'Day of Visit'])
tmp = df['Day of Visit'].str.extract(r'([^-]+)-(.*)', expand=True).apply(pd.to_datetime)
df['Length of Stay'] = (tmp[1] - tmp[0]).dt.days + 1
print(df)
产量
Names Day of Visit Length of Stay
0 Bob 12 Mar 2015 - 31 Mar 2015 20
1 Jessica 27 Mar 2015 - 31 Mar 2015 5
regex pattern ([^-]+)-(.*)
表示
( # start group #1
[ # begin character class
^- # any character except a literal minus sign `-`
] # end character class
+ # match 1-or-more characters from the character class
) # end group #1
- # match a literal minus sign
( # start group #2
.* # match 0-or-more of any character
) # end group #2
.str.extract
返回一个DataFrame,其中包含列中#1和#2组的匹配文本。
答案 1 :(得分:1)
def length_of_stay(x):
start, end = [datetime.strptime(d, '%d %b %Y') for d in x.split(' - ')]
return end - start
df['Length of Stay'] = df['Day of Visit'].apply(length_of_stay)
print df