Pandas和csv导入到数据帧中。如何最好地将日期和日期字段合并为一个

时间:2014-01-14 15:14:37

标签: python pandas dataframe

我有一个我试图导入pandas的csv文件。

有两列兴趣。日期和时间,是前两个列。

E.g.

date,hour,...
10-1-2013,0,
10-1-2013,0,
10-1-2013,0,
10-1-2013,1,
10-1-2013,1,

如何使用pandas导入以便合并小时和日期,或者在初始导入后最好完成?

df = DataFrame.from_csv('bingads.csv',sep =',')

如果我进行初始导入,如何将两者合并为日期,然后删除小时?

由于

4 个答案:

答案 0 :(得分:3)

定义您自己的date_parser

In [291]: from dateutil.parser import parse
In [292]: import datetime as dt
In [293]: def date_parser(x):
   .....:     date, hour = x.split(' ')
   .....:     return parse(date) + dt.timedelta(0, 3600*int(hour))

In [298]: pd.read_csv('test.csv', parse_dates=[[0,1]], date_parser=date_parser)
Out[298]: 
            date_hour  a  b  c
0 2013-10-01 00:00:00  1  1  1
1 2013-10-01 00:00:00  2  2  2
2 2013-10-01 00:00:00  3  3  3
3 2013-10-01 01:00:00  4  4  4
4 2013-10-01 01:00:00  5  5  5

答案 1 :(得分:1)

看看pandas.read_csv接受的parse_dates参数。 你可以这样做:

df = pandas.read_csv('some.csv', parse_dates=True)
# in which case pandas will parse all columns where it finds dates
df = pandas.read_csv('some.csv', parse_dates=[i,j,k])
# in which case pandas will parse the i, j and kth columns for dates

答案 2 :(得分:1)

应用read_csv而不是read_clipboard来处理您的实际数据:

>>> df = pd.read_clipboard(sep=',')
>>> df['date'] = pd.to_datetime(df.date) + pd.to_timedelta(df.hour, unit='D')/24
>>> del df['hour']
>>> df
                 date  ...
0 2013-10-01 00:00:00  NaN
1 2013-10-01 00:00:00  NaN
2 2013-10-01 00:00:00  NaN
3 2013-10-01 01:00:00  NaN
4 2013-10-01 01:00:00  NaN

[5 rows x 2 columns]

答案 3 :(得分:1)

由于您只使用cdv文件中的两列并将它们合并为一列,我会挤进一系列日期时间对象,如下所示:

import pandas as pd 
from StringIO import StringIO
import datetime as dt

txt='''\
date,hour,A,B
10-1-2013,0,1,6
10-1-2013,0,2,7
10-1-2013,0,3,8
10-1-2013,1,4,9
10-1-2013,1,5,10'''

def date_parser(date, hour):
    dates=[]
    for ed, eh in zip(date, hour):
        month, day, year=list(map(int, ed.split('-')))
        hour=int(eh)
        dates.append(dt.datetime(year, month, day, hour))

    return dates    

p=pd.read_csv(StringIO(txt), usecols=[0,1], 
              parse_dates=[[0,1]], date_parser=date_parser, squeeze=True)

print p

打印:

0   2013-10-01 00:00:00
1   2013-10-01 00:00:00
2   2013-10-01 00:00:00
3   2013-10-01 01:00:00
4   2013-10-01 01:00:00
Name: date_hour, dtype: datetime64[ns]