在Julia中将DataFrame重新采样为每小时15分钟和5分钟

时间:2013-12-28 16:36:34

标签: dataframe resampling julia

我对朱莉娅来说很新,但是我试了一下,因为基准测试声称它比Python要快得多。

我正试图以[“unixtime”,“price”,“amount”]格式使用一些股票价格数据。

我设法加载数据并将unixtime转换为Julia中的日期,但现在我需要重新采样数据以使用olhc(开放,高,低,收盘)作为价格和金额的总和,朱莉娅的特定时期(每小时,15分钟,5分钟等):

julia> head(btc_raw_data)
6x3 DataFrame:
                           date price  amount
[1,]    2011-09-13T13:53:36 UTC   5.8     1.0
[2,]    2011-09-13T13:53:44 UTC  5.83     3.0
[3,]    2011-09-13T13:53:49 UTC   5.9     1.0
[4,]    2011-09-13T13:53:54 UTC   6.0    20.0
[5,]    2011-09-13T14:32:53 UTC  5.95 12.4521
[6,]    2011-09-13T14:35:04 UTC  5.88   7.458

我看到有一个名为Resampling的软件包,但它似乎只接受一个时间段,只是我想要输出数据的行数。

还有其他选择吗?

1 个答案:

答案 0 :(得分:1)

您可以使用https://github.com/femtotrader/TimeSeriesIO.jl

将DataFrame(从DataFrames.jl)转换为TimeArray(来自TimeSeries.jl)
using TimeSeriesIO: TimeArray
ta = TimeArray(df, colnames=[:price], timestamp=:date)

您可以使用TimeSeriesResampler https://github.com/femtotrader/TimeSeriesResampler.jl重新取样时间序列(来自TimeSeries.jl的TimeArray) 和TimeFrames https://github.com/femtotrader/TimeFrames.jl

using TimeSeriesResampler: resample, mean, ohlc, sum, TimeFrame

# Define a sample timeseries (prices for example)
idx = DateTime(2010,1,1):Dates.Minute(1):DateTime(2011,1,1)
idx = idx[1:end-1]
N = length(idx)
y = rand(-1.0:0.01:1.0, N)
y = 1000 + cumsum(y)
#df = DataFrame(Date=idx, y=y)
ta = TimeArray(collect(idx), y, ["y"])
println("ta=")
println(ta)

# Define how datetime should be grouped (timeframe)
tf = TimeFrame(dt -> floor(dt, Dates.Minute(15)))

# resample using OHLC values
ta_ohlc = ohlc(resample(ta, tf))
println("ta_ohlc=")
println(ta_ohlc)

# resample using mean values
ta_mean = mean(resample(ta, tf))
println("ta_mean=")
println(ta_mean)

# Define an other sample timeseries (volume for example)
vol = rand(0:0.01:1.0, N)
ta_vol = TimeArray(collect(idx), vol, ["vol"])
println("ta_vol=")
println(ta_vol)

# resample using sum values
ta_vol_sum = sum(resample(ta_vol, tf))
println("ta_vol_sum=")
println(ta_vol_sum)

你应该得到:

julia> ta
525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00

                      y
2010-01-01T00:00:00 | 1000.16
2010-01-01T00:01:00 | 1000.1
2010-01-01T00:02:00 | 1000.98
2010-01-01T00:03:00 | 1001.38
⋮
2010-12-31T23:56:00 | 972.3
2010-12-31T23:57:00 | 972.85
2010-12-31T23:58:00 | 973.74
2010-12-31T23:59:00 | 972.8


julia> ta_ohlc
35040x4 TimeSeries.TimeArray{Float64,2,DateTime,Array{Float64,2}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00

                      Open       High       Low        Close
2010-01-01T00:00:00 | 1000.16    1002.5     1000.1     1001.54
2010-01-01T00:15:00 | 1001.57    1002.64    999.38     999.38
2010-01-01T00:30:00 | 999.13     1000.91    998.91     1000.91
2010-01-01T00:45:00 | 1001.0     1006.42    1001.0     1006.42
⋮
2010-12-31T23:00:00 | 980.84     981.56     976.53     976.53
2010-12-31T23:15:00 | 975.74     977.46     974.71     975.31
2010-12-31T23:30:00 | 974.72     974.9      971.73     972.07
2010-12-31T23:45:00 | 972.33     973.74     971.49     972.8


julia> ta_mean
35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00

                      y
2010-01-01T00:00:00 | 1001.1047
2010-01-01T00:15:00 | 1001.686
2010-01-01T00:30:00 | 999.628
2010-01-01T00:45:00 | 1003.5267
⋮
2010-12-31T23:00:00 | 979.1773
2010-12-31T23:15:00 | 975.746
2010-12-31T23:30:00 | 973.482
2010-12-31T23:45:00 | 972.3427

julia> ta_vol
525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00

                      vol
2010-01-01T00:00:00 | 0.37
2010-01-01T00:01:00 | 0.67
2010-01-01T00:02:00 | 0.29
2010-01-01T00:03:00 | 0.28
⋮
2010-12-31T23:56:00 | 0.74
2010-12-31T23:57:00 | 0.66
2010-12-31T23:58:00 | 0.22
2010-12-31T23:59:00 | 0.47


julia> ta_vol_sum
35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00

                      vol
2010-01-01T00:00:00 | 7.13
2010-01-01T00:15:00 | 6.99
2010-01-01T00:30:00 | 8.73
2010-01-01T00:45:00 | 8.27
⋮
2010-12-31T23:00:00 | 6.11
2010-12-31T23:15:00 | 7.49
2010-12-31T23:30:00 | 5.75
2010-12-31T23:45:00 | 8.36