熊猫的to_datetime函数不会更改dtype

时间:2019-08-29 10:23:42

标签: python pandas

我最近一直在使用python,但发现了一个似乎无法解决的问题。我使用的是熊猫数据集,当我想使用to_datetime函数将变量的dtype从'object'更改为'datetime64'时,不会将其更改为所需的'datetime64'dtype。

到目前为止,我只尝试了to_datetime函数,但这似乎无法解决问题。我正在寻找一种解决方案以使to_datetime或任何其他代码可以将变量的dtype从'object'更改为'datetime64'

在这里您可以找到有关数据集的信息:

df.head()
Formatted Date                      Summary  Precip Type Temperature (C)   Apparent Temperature (C)   Humidity   Wind Speed (km/h)   Wind Bearing (degrees)  Visibility (km)  Loud Cover Pressure (millibars)   Daily Summary
0   2006-04-01 00:00:00.000 +0200   Partly Cloudy   rain    9.472222    7.388889    0.89    14.1197     251.0   15.8263     0.0     1015.13     Partly cloudy throughout the day.
1   2006-04-01 01:00:00.000 +0200   Partly Cloudy   rain    9.355556    7.227778    0.86    14.2646     259.0   15.8263     0.0     1015.63     Partly cloudy throughout the day.
2   2006-04-01 02:00:00.000 +0200   Mostly Cloudy   rain    9.377778    9.377778    0.89    3.9284  204.0   14.9569     0.0     1015.94     Partly cloudy throughout the day.
3   2006-04-01 03:00:00.000 +0200   Partly Cloudy   rain    8.288889    5.944444    0.83    14.1036     269.0   15.8263     0.0     1016.41     Partly cloudy throughout the day.
4   2006-04-01 04:00:00.000 +0200   Mostly Cloudy   rain    8.755556    6.977778    0.83    11.0446     259.0   15.8263     0.0     1016.51     Partly cloudy throughout the day.

在这里,您可以在使用to_datetime函数之前查看dtype:

df.dtypes
Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
dtype: object

在使用to_datetime函数之后,这里:

df['Date'] = pd.to_datetime(df['Formatted Date'])
df.dtypes

Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
Date                         object
dtype: object

你能告诉我我在做什么错吗? 预先感谢!

3 个答案:

答案 0 :(得分:2)

问题

您要将dtype的值从object更改为datetime64

df = pd.DataFrame(data={'col':["2006-04-01 00:00:00.000 +0200"]})
df.dtypes

输出:

col    object
dtype: object

解决方案

要更改类型,您需要应用pd.to_datetime

df['col'] = df['col'].apply(pd.to_datetime)
df.dtypes

输出:

col    datetime64[ns, pytz.FixedOffset(120)]
dtype: object

如果这不起作用,则您的列Formatted Date可能包含不一致的日期格式或NaN值。

真实数据

使用数据集(https://www.kaggle.com/budincsevity/szeged-weather/):

import pandas as pd

# load dataset
df = pd.read_csv('weatherHistory.csv')
df.dtypes
Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
dtype: object
df['Date'] = df['Formatted Date'].apply(pd.to_datetime)
df.dtypes
Formatted Date                      object
Summary                             object
Precip Type                         object
Temperature (C)                    float64
Apparent Temperature (C)           float64
Humidity                           float64
Wind Speed (km/h)                  float64
Wind Bearing (degrees)             float64
Visibility (km)                    float64
Loud Cover                         float64
Pressure (millibars)               float64
Daily Summary                       object
Date                        datetime64[ns]
dtype: object

答案 1 :(得分:1)

我在用列标签处理熊猫和元素时遇到了麻烦。 我做了一个简化的数据框版本,可以使用按索引的列位置更改列dataype。

尝试更改您的:

 pd.to_datetime(df['Formatted Date'])

收件人:

  pd.to_datetime(df.iloc[0])

对我有用:

  data=['2006-04-01 00:00:00.000 +0200']

  df = pd.DataFrame(data)

  df2 = pd.to_datetime(df.iloc[0])

  print(df2.dtypes)

输出为:

  datetime64[ns, pytz.FixedOffset(120)]

我下载了您正在使用的相同数据,我认为这可能是您的数据集的一种可能的解决方案,只需扩展原始代码以处理日期格式即可:

  df['Date'] = pd.to_datetime(df['Formatted Date'], format = '%Y-%m-%d %H:%M:%S.%f %p', errors= 'coerce')

如您所见,“日期”列现在具有正确的数据类型:

Formatted Date                      object
Summary                             object
Precip Type                         object
Temperature (C)                    float64
Apparent Temperature (C)           float64
Humidity                           float64
Wind Speed (km/h)                  float64
Wind Bearing (degrees)             float64
Visibility (km)                    float64
Loud Cover                         float64
Pressure (millibars)               float64
Daily Summary                       object
Date                        datetime64[ns]

答案 2 :(得分:1)

对于var query = source.Join(..) ...; if (applicationId.HasValue) { query = query.Where(x => x.Id == applicationId.Value); } if (!String.IsNullOrEmpty(userName)) { query = query.Where(x => x.UserName == userName); } if (!String.IsNullOrEmpty(status)) { query = query.Where(x => x.StatusName == status); } query = query.Select(x => ...); ,您需要添加参数pandas>=0.24

utc=True
import pandas as pd

# load dataset
df = pd.read_csv('weatherHistory.csv')

df['Date'] = df['Formatted Date'].apply(pd.to_datetime, utc=True)