我最近一直在使用python,但发现了一个似乎无法解决的问题。我使用的是熊猫数据集,当我想使用to_datetime函数将变量的dtype从'object'更改为'datetime64'时,不会将其更改为所需的'datetime64'dtype。
到目前为止,我只尝试了to_datetime函数,但这似乎无法解决问题。我正在寻找一种解决方案以使to_datetime或任何其他代码可以将变量的dtype从'object'更改为'datetime64'
在这里您可以找到有关数据集的信息:
df.head()
Formatted Date Summary Precip Type Temperature (C) Apparent Temperature (C) Humidity Wind Speed (km/h) Wind Bearing (degrees) Visibility (km) Loud Cover Pressure (millibars) Daily Summary
0 2006-04-01 00:00:00.000 +0200 Partly Cloudy rain 9.472222 7.388889 0.89 14.1197 251.0 15.8263 0.0 1015.13 Partly cloudy throughout the day.
1 2006-04-01 01:00:00.000 +0200 Partly Cloudy rain 9.355556 7.227778 0.86 14.2646 259.0 15.8263 0.0 1015.63 Partly cloudy throughout the day.
2 2006-04-01 02:00:00.000 +0200 Mostly Cloudy rain 9.377778 9.377778 0.89 3.9284 204.0 14.9569 0.0 1015.94 Partly cloudy throughout the day.
3 2006-04-01 03:00:00.000 +0200 Partly Cloudy rain 8.288889 5.944444 0.83 14.1036 269.0 15.8263 0.0 1016.41 Partly cloudy throughout the day.
4 2006-04-01 04:00:00.000 +0200 Mostly Cloudy rain 8.755556 6.977778 0.83 11.0446 259.0 15.8263 0.0 1016.51 Partly cloudy throughout the day.
在这里,您可以在使用to_datetime函数之前查看dtype:
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
dtype: object
在使用to_datetime函数之后,这里:
df['Date'] = pd.to_datetime(df['Formatted Date'])
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date object
dtype: object
你能告诉我我在做什么错吗? 预先感谢!
答案 0 :(得分:2)
您要将dtype
的值从object
更改为datetime64
。
df = pd.DataFrame(data={'col':["2006-04-01 00:00:00.000 +0200"]})
df.dtypes
输出:
col object
dtype: object
要更改类型,您需要应用pd.to_datetime
。
df['col'] = df['col'].apply(pd.to_datetime)
df.dtypes
输出:
col datetime64[ns, pytz.FixedOffset(120)]
dtype: object
如果这不起作用,则您的列Formatted Date
可能包含不一致的日期格式或NaN
值。
使用数据集(https://www.kaggle.com/budincsevity/szeged-weather/):
import pandas as pd
# load dataset
df = pd.read_csv('weatherHistory.csv')
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
dtype: object
df['Date'] = df['Formatted Date'].apply(pd.to_datetime)
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date datetime64[ns]
dtype: object
答案 1 :(得分:1)
我在用列标签处理熊猫和元素时遇到了麻烦。 我做了一个简化的数据框版本,可以使用按索引的列位置更改列dataype。
尝试更改您的:
pd.to_datetime(df['Formatted Date'])
收件人:
pd.to_datetime(df.iloc[0])
对我有用:
data=['2006-04-01 00:00:00.000 +0200']
df = pd.DataFrame(data)
df2 = pd.to_datetime(df.iloc[0])
print(df2.dtypes)
输出为:
datetime64[ns, pytz.FixedOffset(120)]
我下载了您正在使用的相同数据,我认为这可能是您的数据集的一种可能的解决方案,只需扩展原始代码以处理日期格式即可:
df['Date'] = pd.to_datetime(df['Formatted Date'], format = '%Y-%m-%d %H:%M:%S.%f %p', errors= 'coerce')
如您所见,“日期”列现在具有正确的数据类型:
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date datetime64[ns]
答案 2 :(得分:1)
对于var query = source.Join(..) ...;
if (applicationId.HasValue) {
query = query.Where(x => x.Id == applicationId.Value);
}
if (!String.IsNullOrEmpty(userName)) {
query = query.Where(x => x.UserName == userName);
}
if (!String.IsNullOrEmpty(status)) {
query = query.Where(x => x.StatusName == status);
}
query = query.Select(x => ...);
,您需要添加参数pandas>=0.24
。
utc=True
import pandas as pd
# load dataset
df = pd.read_csv('weatherHistory.csv')
df['Date'] = df['Formatted Date'].apply(pd.to_datetime, utc=True)