我正在尝试将csv文件导入到pandas数据框中,将其四个列合并到一个' datetime'列,并将datetime列设置为数据类型Datetime64。
这是我原始数据的一小部分样本:
0,1,2,3,4,5,6,7,8,9,10
0, 3001, 1, 1, 0, 0, 5, 4, 1.00, 0, 0, 0
1, 3001, 1, 1, 100, 0, 7, 5, 1.00, 0, 0, 0
2, 3001, 1, 1, 200, 0, 9, 6, 1.00, 0, 0, 0
3, 3001, 1, 1, 300, 0, 9, 7, 1.00, 0, 0, 0
4, 3001, 1, 1, 400, 0, 11, 8, 1.00, 0, 0, 0
到目前为止,我有:
dateparse = lambda x: datetime.strptime(x, '%Y %m %d %H%M')
Test = (read_csv(
'file.csv',
names=["year","month","day","hour","a","b","c","d","e","f","g"],
parse_dates={"datetime": ["year","month","day","hour"]},
date_parser=dateparse,
usecols=["year","month","day","hour","b"]
))
Test.head()
这似乎对许多其他人有用,但在这里不起作用。这是产生的:
index, datetime, temp_hmean
0, 3001-01-01 00:00:00, 4.7
1, 3001-01-01 01:00:00, 3.4
2, 3001-01-01 02:00:00, 5.4
3, 3001-01-01 03:00:00, 4.3
4, 3001-01-01 04:00:00, 5.5
我还尝试添加转换日期时间'列之后的日期时间,根据其他文章,但它不起作用。
这里保留原始列:
Test['datetime'] = Test.apply(
lambda row: datetime.strptime(
row['year']+ '-' + row['month']+ '-' + row['day']+ ' ' + row['hour'],
'%Y-%m-%d %H%M'),
axis=1
)
或者在这里不保留原件:
Test['datetime'] = to_datetime(
Test['datetime'], format="%Y-%m-%d %H:%M:%S"
)