转换大熊猫中的日期格式

时间:2020-06-17 20:26:58

标签: python pandas date-conversion

我有一个数据框:

print(df_test)

               Name Birth Date
0     Anna B Wilson   JUL 1861
1  Victor C Burnett   NOV 1847
2     Ausia Burnett   JUN 1898
3    Alfred Burnett   MAR 1896
4     Viola Burnett   AUG 1894

我希望输出为:

               Name Birth Date
0     Anna B Wilson     7-1861
1  Victor C Burnett    11-1847
2     Ausia Burnett     6-1898
3    Alfred Burnett     3-1896
4     Viola Burnett     8-1894

我是否有一种简洁的方法来执行此操作,而无需每月编写单独的正则表达式,即

df_test = df_test.replace(to_replace ='(MAR)\s(\d{4})', value = r'3-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUN)\s(\d{4})', value = r'6-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUL)\s(\d{4})', value = r'7-\2', regex = True)
df_test = df_test.replace(to_replace ='(AUG)\s(\d{4})', value = r'8-\2', regex = True)
df_test = df_test.replace(to_replace ='(NOV)\s(\d{4})', value = r'11-\2', regex = True)
print(df_test)

编辑: 因此,药膏中有蝇。日期数据并非全部采用相同的格式。例如,存在第5-8行中的异常:

                       Name    Birth Date
0             Anna B Wilson      JUL 1861
1          Victor C Burnett      NOV 1847
2             Ausia Burnett      JUN 1898
3            Alfred Burnett      MAR 1896
4             Viola Burnett      AUG 1894
5             Marinda Lynde          1843
6              Iola Staffen  Jan Abt 1880
7  Maryella Dolores Staffin   30 AUG 1913
8   Norman Lawrence Schmitt   22 JUN 1945

1 个答案:

答案 0 :(得分:0)

您实际上不需要正则表达式,可以使用pd.to_datetime()后跟strftime()来指定所需的格式,例如:

test_df = pd.DataFrame({'Name':['A','B','C','D','E'],
                        'Birthdate':['JUL 1861', 'NOV 1847','JUN 1898','MAR 1896','AUG 1894']})
test_df['Birthdate'] = pd.to_datetime(test_df['Birthdate'],infer_datetime_format=True).dt.strftime('%m-%Y')

输出:

  Name Birthdate
0    A   07-1861
1    B   11-1847
2    C   06-1898
3    D   03-1896
4    E   08-1894