使用具有固定字符串序列的pandas解析数据

时间:2016-07-25 06:46:22

标签: python csv datetime pandas dataframe

我的数据在文件a.dat中显示如下:

01/Jul/2016 00:05:09      8438.2
01/Jul/2016 00:05:19      8422.4 g

我希望将它们解析为三列:时间轴,浮动数字,字符串(无或g)

我试过了:

df=pd.read_csv('a.dat',sep='      | ',engine='python')

最后有4列:date,time,float和g

df=pd.read_csv('a.dat',sep='      | (g)',engine='python')

给出5列,列1和4为NaN

有没有更好的方法来创建数据帧而不进行任何后期处理?

1 个答案:

答案 0 :(得分:2)

您可以使用read_csv

import pandas as pd
import io

temp=u'''01/Jul/2016 00:05:09      8438.2
01/Jul/2016 00:05:19      8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 sep='\s+', 
                 names=['date','time','float','string'], 
                 parse_dates=[['date','time']])
print (df)
            date_time   float string
0 2016-07-01 00:05:09  8438.2    NaN
1 2016-07-01 00:05:19  8422.4      g

或者:

import pandas as pd
import io

temp=u'''01/Jul/2016 00:05:09      8438.2
01/Jul/2016 00:05:19      8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 delim_whitespace=True, 
                 names=['date','time','float','string'], 
                 parse_dates=[['date','time']])
print (df)
            date_time   float string
0 2016-07-01 00:05:09  8438.2    NaN
1 2016-07-01 00:05:19  8422.4      g

read_fwf的解决方案:

import pandas as pd
import io

temp=u'''01/Jul/2016 00:05:09      8438.2  
01/Jul/2016 00:05:19      8422.4 g'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_fwf(io.StringIO(temp), 
                 names=['date','time','float','string'], 
                 parse_dates=[['date','time']])
print (df)
            date_time   float string
0 2016-07-01 00:05:09  8438.2    NaN
1 2016-07-01 00:05:19  8422.4      g

您还可以指定列的宽度:

df = pd.read_fwf(io.StringIO(temp), 
                 fwidths = [20,12,2],
                 names=['date','time','float','string'], 
                 parse_dates=[['date','time']])
print (df)
            date_time   float string
0 2016-07-01 00:05:09  8438.2    NaN
1 2016-07-01 00:05:19  8422.4      g