例如,我有以下head
of data.frame
:
period bid_open bid_high bid_low bid_close ask_open ask_high
1 2015-01-02 00:00:00 1.20860 1.20880 1.20860 1.20870 1.20890 1.20890
2 2015-01-02 00:01:00 1.20870 1.20880 1.20865 1.20865 1.20880 1.20890
3 2015-01-02 00:02:00 1.20865 1.20880 1.20865 1.20875 1.20875 1.20885
4 2015-01-02 00:03:00 1.20875 1.20885 1.20875 1.20885 1.20885 1.20900
5 2015-01-02 00:04:00 1.20885 1.20885 1.20880 1.20880 1.20895 1.20895
6 2015-01-02 00:05:00 1.20880 1.20885 1.20880 1.20880 1.20890 1.20895
主要关注的是第一列period
- 数据的时间频率可以是1m(如下所示),1s,1h,1d。我想编写将包含参数frequency
的函数。例如,如果frequency=2h
,函数输出是新的data.frame,其中包含2h的观察值(股票价格):
2015-01-02 00:00:00
2015-01-02 02:00:00
2015-01-02 04:00:00
....
如果频率为15s
(f.e.),则R必须输出初始数据帧,因为初始数据的频率为1米。
但我有几个问题要实现这个任务。你能帮帮我吗?
我的逻辑是:
首先,找到初始频率:
time=data[,1]
freq=as.numeric(difftime(time[2], time[1]))
但问题是R只显示数字(在这种情况下为freq=1
)而我不知道它是1m还是1h或1d。如何纠正?
freq=5m
,但我的数据频率为1米,因此我需要更正我的表并仅保留1st,6th,11th...
行。我该怎么做?
谢谢!答案 0 :(得分:0)
以下是可能的解决方案之一:
# 1. Load library
library(dplyr)
# 2. Data set sample
df <- data.frame(
period = c("2015-01-02 00:00:00", "2015-01-02 00:01:00", "2015-01-02 00:02:00", "2015-01-02 00:03:00", "2015-01-02 00:04:00", "2015-01-02 00:05:00"),
bid_open = c(1.20860, 1.20870, 1.20865, 1.20875, 1.20885, 00:05:00))
# 3. Feature engineering
df <- df %>% mutate(
year = as.numeric(substr(period, 1, 4)),
month = as.numeric(substr(period, 6, 7)),
day = as.numeric(substr(period, 9, 10)),
hour = as.numeric(substr(period, 12, 13)),
min = as.numeric(substr(period, 15, 16)),
sec = as.numeric(substr(period, 18, 19)))
# 4. Select data function
select_data <- function(df, str_frequency){
# 1. Define frequency parameters
frequency_value <- as.numeric(substr(str_frequency, 1, 2))
frequency_type <- substr(str_frequency, 3, nchar(str_frequency))
# 2. Calculate result by using modulus operator %%
df_result <- df[!(df[, c(frequency_type)] %% frequency_value), ]
# 3. Return result
return(df_result)
}
# 5. Test (filter for "02min" as a basic test)
select_data(df, "01year")
select_data(df, "01month")
select_data(df, "01day")
select_data(df, "01hour")
select_data(df, "02min") # should filter here / change to "03min" also works
select_data(df, "01sec")