我有一个csv文件,并通过熊猫读取它:
(100, 28, 28)
csv文件内容如下:
cols=['DATE(GMT)','TIME(GMT)',DATASET]
df=pd.read_csv('datasets.csv', usecols=cols)
现在,我需要将'DATE(GMT)','TIME(GMT)'合并为单个DateTime列。这样我只能有两列,即DATETIME和DATASET
答案 0 :(得分:0)
您可以将parse_dates
列的参数red_csv
添加到datetime
:
df = pd.read_csv('datasets.csv', usecols=cols, parse_dates=['DATE(GMT)'])
print (df.dtypes)
DATE(GMT) datetime64[ns]
TIME(GMT) int64
DATASET int64
dtype: object
然后添加“时间”列转换为to_timedelta
:
df['DATE(GMT)'] += pd.to_timedelta(df.pop('TIME(GMT)').astype(str), unit='H')
print (df)
DATE(GMT) DATASET
0 2018-05-01 00:00:00 10
1 2018-05-01 01:00:00 15
2 2018-05-01 02:00:00 21
3 2018-05-01 03:00:00 9
4 2018-05-01 04:00:00 25
5 2018-05-01 05:00:00 7
6 2018-05-02 14:00:00 65
编辑:
存在一些数据非数字的问题:
print (df)
DATE(GMT) TIME(GMT) DATASET
0 05-01-2018 0 10
1 05-01-2018 1 15
2 05-01-2018 2 21
3 05-01-2018 3 9
4 05-01-2018 4 25
5 05-01-2018 s 7
6 05-02-2018 a 65
您可以找到它:
print (df[pd.to_numeric(df['TIME(GMT)'], errors='coerce').isnull()])
DATE(GMT) TIME(GMT) DATASET
5 05-01-2018 s 7
6 05-02-2018 a 65
然后根据需要用0
(所有缺失值)重新填充它:
df['TIME(GMT)'] = pd.to_numeric(df['TIME(GMT)'], errors='coerce').fillna(0)
print (df)
DATE(GMT) TIME(GMT) DATASET
0 05-01-2018 0.0 10
1 05-01-2018 1.0 15
2 05-01-2018 2.0 21
3 05-01-2018 3.0 9
4 05-01-2018 4.0 25
5 05-01-2018 0.0 7
6 05-02-2018 0.0 65