我有多个标头的csv文件:
Year', 'Month', 'Day', 'Hour', 'Minute' --other headers--
我想用datetime缩小这些尺寸。现在我正在做:
date = datetime(year=int(d[0]), month=int(d[1]), day=int(d[2]]), hour=int(d[3]]), minute=int(d[4]]))
但是,如何实际删除这些列并将日期列放在所有其他列之前?我最终希望“日期”成为熊猫数据框中的索引。
更新: 我的一个CSV的前五行是:
"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777
我想
"Date","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
DATETIMEOBJECT,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
DATETIMEOBJECT,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
DATETIMEOBJECT,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
DATETIMEOBJECT,16.38,11.93,-999.00,160.68,10.47,179.45,88.777
答案 0 :(得分:3)
您可以将parse_dates
参数指定为与日期时间相关的列的索引,并编写自定义的date_parser
函数以将多列转换为日期时间列。如果您希望它成为索引,请指定index_col
:
import pandas as pd
from io import StringIO
pd.read_csv(StringIO("""
"Year","Month","Day","Hour","Minute","Direct","Diffuse","D_Global","D_IR","U_Global","U_IR","Zenith"
2015,4,1,0,1,17.59,13.27,-999.00,158.94,10.75,178.97,88.700
2015,4,1,0,2,20.23,12.67,-999.00,159.45,10.68,178.99,88.726
2015,4,1,0,3,20.90,12.42,-999.00,159.77,10.55,179.03,88.752
2015,4,1,0,4,16.38,11.93,-999.00,160.68,10.47,179.45,88.777"""),
sep = ",",
parse_dates={'Date': [0,1,2,3,4]},
date_parser = lambda x: pd.to_datetime(x, format="%Y %m %d %H %M"),
index_col = ['Date'])
更新:
date_parser
函数的工作方式符合pd.read_csv()
docs:
1)传递一个或多个阵列 (由parse_dates定义)作为参数; 2)连接(行方式) 从parse_dates定义的列到单个数组的字符串值 通过那个; 3)使用一个或多个为每行调用date_parser一次 字符串(对应于parse_dates定义的列)作为参数。
根据自定义日期解析器函数的编写方式,有多种方法可以使用此参数,在上面给出的情况下,例如,使用第二个选项,对于每一行,列0-4连接为字符串由空格分隔并传递给pd.to_datetime()
函数,因此格式为%Y %m %d %H %M
。