Python用datetime替换多个日期列

时间:2016-11-09 20:29:25

标签: python csv datetime pandas

我有多个标头的csv文件:

Year', 'Month', 'Day', 'Hour', 'Minute' --other headers--

我想用datetime缩小这些尺寸。现在我正在做:

date = datetime(year=int(d[0]), month=int(d[1]), day=int(d[2]]), hour=int(d[3]]), minute=int(d[4]]))

但是,如何实际删除这些列并将日期列放在所有其他列之前?我最终希望“日期”成为熊猫数据框中的索引。

更新: 我的一个CSV的前五行是:

"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777

我想

"Date","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
DATETIMEOBJECT,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
DATETIMEOBJECT,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
DATETIMEOBJECT,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
DATETIMEOBJECT,16.38,11.93,-999.00,160.68,10.47,179.45,88.777

1 个答案:

答案 0 :(得分:3)

您可以将parse_dates参数指定为与日期时间相关的列的索引,并编写自定义的date_parser函数以将多列转换为日期时间列。如果您希望它成为索引,请指定index_col

import pandas as pd   
from io import StringIO   
pd.read_csv(StringIO("""
"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777"""), 
            sep = ",", 
            parse_dates={'Date': [0,1,2,3,4]}, 
            date_parser = lambda x: pd.to_datetime(x, format="%Y %m %d %H %M"), 
            index_col = ['Date'])

enter image description here

更新date_parser函数的工作方式符合pd.read_csv() docs:

  

1)传递一个或多个阵列       (由parse_dates定义)作为参数; 2)连接(行方式)       从parse_dates定义的列到单个数组的字符串值       通过那个; 3)使用一个或多个为每行调用date_parser一次       字符串(对应于parse_dates定义的列)作为参数。

根据自定义日期解析器函数的编写方式,有多种方法可以使用此参数,在上面给出的情况下,例如,使用第二个选项,对于每一行,列0-4连接为字符串由空格分隔并传递给pd.to_datetime()函数,因此格式为%Y %m %d %H %M