Question

我有一个数据框，其中前3列为'MONTH'，'DAY'，'YEAR'

在每列中都有一个整数。在数据框中有没有Pythonic方法将所有三列转换为日期时间？

自：

M    D    Y    Apples   Oranges
5    6  1990      12        3
5    7  1990      14        4
5    8  1990      15       34
5    9  1990      23       21

成：

Datetimes    Apples   Oranges
1990-6-5        12        3
1990-7-5        14        4
1990-8-5        15       34
1990-9-5        23       21

Answer 1

在版本0.18.1中，您可以使用to_datetime，但是：

列的名称必须为year，month，day，hour，minute和second：
最小列为year，month和day

样品：

import pandas as pd

df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                    'day': [4, 5],
                    'hour': [2, 3],
                    'minute': [10, 30],
                    'second': [21,25]})

print df
   day  hour  minute  month  second  year
0    4     2      10      2      21  2015
1    5     3      30      3      25  2016

print pd.to_datetime(df[['year', 'month', 'day']])
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

print pd.to_datetime(df[['year', 'month', 'day', 'hour']])
0   2015-02-04 02:00:00
1   2016-03-05 03:00:00
dtype: datetime64[ns]

print pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])
0   2015-02-04 02:10:00
1   2016-03-05 03:30:00
dtype: datetime64[ns]

print pd.to_datetime(df)
0   2015-02-04 02:10:21
1   2016-03-05 03:30:25
dtype: datetime64[ns]

另一种解决方案是转换为dictionary：

print df
   M  D     Y  Apples  Oranges
0  5  6  1990      12        3
1  5  7  1990      14        4
2  5  8  1990      15       34
3  5  9  1990      23       21

print pd.to_datetime(dict(year=df.Y, month=df.M, day=df.D))
0   1990-05-06
1   1990-05-07
2   1990-05-08
3   1990-05-09
dtype: datetime64[ns]

Answer 2

在0.13（即将推出）中，这是经过大量优化并且非常快（但在0.12中仍然相当快）;比循环快两个数量级

In [3]: df
Out[3]: 
   M  D     Y  Apples  Oranges
0  5  6  1990      12        3
1  5  7  1990      14        4
2  5  8  1990      15       34
3  5  9  1990      23       21

In [4]: df.dtypes
Out[4]: 
M          int64
D          int64
Y          int64
Apples     int64
Oranges    int64
dtype: object

# in 0.12, use this
In [5]: pd.to_datetime((df.Y*10000+df.M*100+df.D).apply(str),format='%Y%m%d')

# in 0.13 the above or this will work
In [5]: pd.to_datetime(df.Y*10000+df.M*100+df.D,format='%Y%m%d')
Out[5]: 
0   1990-05-06 00:00:00
1   1990-05-07 00:00:00
2   1990-05-08 00:00:00
3   1990-05-09 00:00:00
dtype: datetime64[ns]

Answer 3

我重新解决了这个问题，我想我找到了一个解决方案。我按以下方式初始化了csv文件：

pandas_object = DataFrame(read_csv('/Path/to/csv/file', parse_dates=True, index_col = [2,0,1] ))

其中：

index_col = [2,0,1]

表示[年，月，日]的列

现在唯一的问题是现在我有三个新索引列，一个代表年份，另一个代表月份，另一个代表当天。

Answer 4

Arrays.asList(a, b, c, d).contains(x);

Answer 5

将数据帧转换为字符串以便于字符串连接：

df=df.astype(str)

然后转换为datetime，指定格式：

df.index=pd.to_datetime(df.Y+df.M+df.D,format="%Y%m%d")

替换索引而不是创建新列。

Answer 6

假设您有一个字典foo，每列日期并行。如果是这样，这是你的一个班轮：

>>> from datetime import datetime
>>> foo = {"M": [1,2,3], "D":[30,30,21], "Y":[1980,1981,1982]}
>>>
>>> df = pd.DataFrame({"Datetime": [datetime(y,m,d) for y,m,d in zip(foo["Y"],foo["M"],foo["D"])]})

真正的胆量是这一点：

>>> [datetime(y,m,d) for y,m,d in zip(foo["Y"],foo["M"],foo["D"])]
[datetime.datetime(1980, 1, 30, 0, 0), datetime.datetime(1981, 2, 28, 0, 0), datetime.datetime(1982, 3, 21, 0, 0)]

这就是zip所做的事情。它需要并行列表并将它们转换为元组。然后他们通过列表解析得到元组解包（for y,m,d in位），然后输入datetime对象构造函数。

pandas似乎对datetime对象感到满意。

Answer 7

更好的方法如下：

import pandas as pd

import datetime

dataset = pd.read_csv('dataset.csv')

date=dataset.apply(lambda x: datetime.date(int(x['Yr']), x['Mo'], x['Dy']),axis=1)

date = pd.to_datetime(date)

dataset = dataset.drop(columns=['Yr', 'Mo', 'Dy'])

dataset.insert(0, 'Date', date)

dataset.head()

如何将列转换为pandas中的一个日期时间列？

7 个答案: