我有一个数据框:
print(df_test)
Name Birth Date
0 Anna B Wilson JUL 1861
1 Victor C Burnett NOV 1847
2 Ausia Burnett JUN 1898
3 Alfred Burnett MAR 1896
4 Viola Burnett AUG 1894
我希望输出为:
Name Birth Date
0 Anna B Wilson 7-1861
1 Victor C Burnett 11-1847
2 Ausia Burnett 6-1898
3 Alfred Burnett 3-1896
4 Viola Burnett 8-1894
我是否有一种简洁的方法来执行此操作,而无需每月编写单独的正则表达式,即
df_test = df_test.replace(to_replace ='(MAR)\s(\d{4})', value = r'3-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUN)\s(\d{4})', value = r'6-\2', regex = True)
df_test = df_test.replace(to_replace ='(JUL)\s(\d{4})', value = r'7-\2', regex = True)
df_test = df_test.replace(to_replace ='(AUG)\s(\d{4})', value = r'8-\2', regex = True)
df_test = df_test.replace(to_replace ='(NOV)\s(\d{4})', value = r'11-\2', regex = True)
print(df_test)
?
编辑: 因此,药膏中有蝇。日期数据并非全部采用相同的格式。例如,存在第5-8行中的异常:
Name Birth Date
0 Anna B Wilson JUL 1861
1 Victor C Burnett NOV 1847
2 Ausia Burnett JUN 1898
3 Alfred Burnett MAR 1896
4 Viola Burnett AUG 1894
5 Marinda Lynde 1843
6 Iola Staffen Jan Abt 1880
7 Maryella Dolores Staffin 30 AUG 1913
8 Norman Lawrence Schmitt 22 JUN 1945
答案 0 :(得分:0)
您实际上不需要正则表达式,可以使用pd.to_datetime()
后跟strftime()
来指定所需的格式,例如:
test_df = pd.DataFrame({'Name':['A','B','C','D','E'],
'Birthdate':['JUL 1861', 'NOV 1847','JUN 1898','MAR 1896','AUG 1894']})
test_df['Birthdate'] = pd.to_datetime(test_df['Birthdate'],infer_datetime_format=True).dt.strftime('%m-%Y')
输出:
Name Birthdate
0 A 07-1861
1 B 11-1847
2 C 06-1898
3 D 03-1896
4 E 08-1894