我的Pandas数据框中有一个名为start_date
的列,字符串格式为:
start_date
'20120212'
'20120514'
'20121124'
'20120604'
要提取和创建月,年和日的单独列,这就是我目前正在做的事情。是否有更好的方法来做同样的事情?
df['start_month']=df['start_date'].apply(lambda x:str(x)[4:6])
df['start_year']=df['start_date'].apply(lambda x:str(x)[0:4])
df['start_day']=df['start_date'].apply(lambda x:str(x)[6:8])
答案 0 :(得分:3)
使用to_datetime
,然后提取年,月和日:
select a.cat_id,
a.cat_desc,
b.cat_desc,
group_concat(c.cat_desc order by c.cat_id asc)
from category a
left JOIN category b on (a.parent_category=b.cat_id)
left JOIN category c on find_in_set(c.cat_id,a.par_cat_order)
GROUP by a.cat_id
按a = pd.to_datetime(df['start_date'], format='%Y%m%d')
df['start_month'] = a.dt.month
df['start_year'] = a.dt.year
df['start_day'] = a.dt.day
切片并投放到str[]
:
int
比较解决方案:
df['start_date'] = df['start_date'].astype(str)
df['start_month'] = df['start_date'].str[4:6].astype(int)
df['start_year']=df['start_date'].str[:4].astype(int)
df['start_day']=df['start_date'].str[6:8].astype(int)
print (df)
start_date start_month start_year start_day
0 20120212 2 2012 12
1 20120514 5 2012 14
2 20121124 11 2012 24
3 20120604 6 2012 4
[40000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
def orig(df):
df['start_month']=df['start_date'].apply(lambda x:str(x)[4:6]).astype(int)
df['start_year']=df['start_date'].apply(lambda x:str(x)[0:4]).astype(int)
df['start_day']=df['start_date'].apply(lambda x:str(x)[6:8]).astype(int)
return df
def a(df):
a = pd.to_datetime(df['start_date'], format='%Y%m%d')
df['start_month'] = a.dt.month
df['start_year'] = a.dt.year
df['start_day'] = a.dt.day
return df
def b(df):
df['start_month'] = df['start_date'].str[4:6].astype(int)
df['start_year']=df['start_date'].str[:4].astype(int)
df['start_day']=df['start_date'].str[6:8].astype(int)
return df