我的数据在文件a.dat中显示如下:
01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g
我希望将它们解析为三列:时间轴,浮动数字,字符串(无或g)
我试过了:
df=pd.read_csv('a.dat',sep=' | ',engine='python')
最后有4列:date,time,float和g
df=pd.read_csv('a.dat',sep=' | (g)',engine='python')
给出5列,列1和4为NaN
有没有更好的方法来创建数据帧而不进行任何后期处理?
答案 0 :(得分:2)
您可以使用read_csv
:
import pandas as pd
import io
temp=u'''01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
sep='\s+',
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g
或者:
import pandas as pd
import io
temp=u'''01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
delim_whitespace=True,
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g
read_fwf
的解决方案:
import pandas as pd
import io
temp=u'''01/Jul/2016 00:05:09 8438.2
01/Jul/2016 00:05:19 8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_fwf(io.StringIO(temp),
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g
您还可以指定列的宽度:
df = pd.read_fwf(io.StringIO(temp),
fwidths = [20,12,2],
names=['date','time','float','string'],
parse_dates=[['date','time']])
print (df)
date_time float string
0 2016-07-01 00:05:09 8438.2 NaN
1 2016-07-01 00:05:19 8422.4 g