在Python3和熊猫中,我有一个数据框,其中有一列代表日期的字符串-“ DataFim”列
df_lotacoes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52725 entries, 0 to 52724
Data columns (total 5 columns):
DataFim 48854 non-null object
DataInicio 52725 non-null object
IdUA 52725 non-null object
NomeFuncionario 52725 non-null object
NomeUA 52725 non-null object
dtypes: object(5)
memory usage: 1.0+ MB
print(df_lotacoes['DataFim'])
DataFim
0 2018-11-05T00:00:00-02:00
1 2008-08-28T00:00:00-03:00
2 2002-08-08T00:00:00-03:00
3 2007-03-14T00:00:00-03:00
4 2005-05-06T00:00:00-03:00
我试图将其转换为日期,但它仍然作为对象
df_lotacoes['DataFim'] = pd.to_datetime(df_lotacoes['DataFim'])
DataFim
0 2018-11-05 00:00:00-02:00
1 2008-08-28 00:00:00-03:00
2 2002-08-08 00:00:00-03:00
3 2007-03-14 00:00:00-03:00
4 2005-05-06 00:00:00-03:00
DataFim 48854 non-null object
我只需要年,月和日的信息。我想忽略的其他时间数据
请,有人知道我如何转换这种格式吗?
答案 0 :(得分:2)
使用str.extract提取日期部分并将其转换为datetime,
df['DataFim'] = pd.to_datetime(df['DataFim'].str.extract('(.*)T')[0], format = '%Y-%m-%d')
DataFim
0 2018-11-05
1 2008-08-28
2 2002-08-08
3 2007-03-14
4 2005-05-06
选项2:您也可以使用str.split
df['DataFim'] = pd.to_datetime(df['DataFim'].str.split('T').str[0], format = '%Y-%m-%d')
使用正则表达式很有趣
df['DataFim'] = pd.to_datetime(df['DataFim'].str.replace('T.*', '', regex = True), format = '%Y-%m-%d')