我有一个包含以下行的.DAT文件:
2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4
.......
数据格式为:
year month day hour minute(HST) wind_speed(kts) wind_direction(dec) temperature(C) relative_humidity(%) pressure
我想将.DAT文件导入到pandas数据框中,将年 - 月 - 日 - 小时 - 分钟作为单个索引列,将其余值作为单独的列导入。
有什么建议吗?
谢谢!
答案 0 :(得分:2)
您可以使用read_csv
:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
import datetime as dt
temp=u"""2016 01 01 00 00 19 348 2.05 7 618.4
2016 01 01 00 01 19 351 2.05 7 618.4
2016 01 01 00 02 18 0 2.05 7 618.4
2016 01 01 00 03 17 353 2.05 7 618.4
2016 01 01 00 04 19 346 2.02 7 618.4
2016 01 01 00 05 20 345 2.00 7 618.4
2016 01 01 00 06 22 348 1.97 7 618.4"""
#after testing replace StringIO(temp) to filename
parser = lambda date: pd.datetime.strptime(date, '%Y %m %d %H %M')
df = pd.read_csv(StringIO(temp),
sep="\s+", #separator whitespace
index_col=0, #convert first column to datetimeindex
date_parser=parser, #function for converting dates
parse_dates=[[0,1,2,3,4]], #columns to datetime
header=None) #none header
然后需要设置列名,因为如果使用参数names
得到:
NotImplementedError:尚不支持文件结构
df.columns = ['wind_speed(kts)', 'wind_direction(dec)', 'temperature(C)', 'relative_humidity(%)', 'pressure']
#remove index name
df.index.name = None
print (df)
wind_speed(kts) wind_direction(dec) temperature(C) \
2016-01-01 00:00:00 19 348 2.05
2016-01-01 00:01:00 19 351 2.05
2016-01-01 00:02:00 18 0 2.05
2016-01-01 00:03:00 17 353 2.05
2016-01-01 00:04:00 19 346 2.02
2016-01-01 00:05:00 20 345 2.00
2016-01-01 00:06:00 22 348 1.97
relative_humidity(%) pressure
2016-01-01 00:00:00 7 618.4
2016-01-01 00:01:00 7 618.4
2016-01-01 00:02:00 7 618.4
2016-01-01 00:03:00 7 618.4
2016-01-01 00:04:00 7 618.4
2016-01-01 00:05:00 7 618.4
2016-01-01 00:06:00 7 618.4
print (df.dtypes)
wind_speed(kts) int64
wind_direction(dec) int64
temperature(C) float64
relative_humidity(%) int64
pressure float64
dtype: object
print (df.index)
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:01:00',
'2016-01-01 00:02:00', '2016-01-01 00:03:00',
'2016-01-01 00:04:00', '2016-01-01 00:05:00',
'2016-01-01 00:06:00'],
dtype='datetime64[ns]', freq=None)
答案 1 :(得分:1)
这是一个更快的版本:
request = RestClient::Request.new(
method: :get,
url: 'https://my-rest-service.com/resource.json')
response = request.execute {|response| response}
case response.code
when 200
puts "Good"
when 401
puts "Bad"
raise Exception
end