检查data.table中是否有一个整数在特定范围内?

时间:2017-01-29 06:46:04

标签: r data.table

是否有一种简单的方法来评估范围并检查整数是否在该范围内?

除了这篇文章Check to see if a value is within a range in R?之外,我没有找到其他相关内容。

实施例

range <- cut(rep(1,5),4) # Create intervals
range.test <- range[2]
# Now I want to check whether integer 1L is within the range.test (Of course it is)
Code comes here.

我尝试使用findInterval并将range.test转换为向量,或使用seqinrange或其他功能但失败了。

由于所有分析都基于data.table,并且这部分分析构成了整个实践的一部分,其中输出首选为一个data.table,因此我将标记data.table设置为确保一致性。

修改

data.table

背景下的整体情况
dt <- data.table(structure(list(Time = c("2016-01-04 09:05:06", "2016-01-04 09:20:00","2016-01-04 09:30:00", "2016-01-04 09:30:01", "2016-01-04 09:30:02","2016-01-04 09:30:05", "2016-01-04 09:30:06", "2016-01-04 09:31:35","2016-01-04 09:31:38", "2016-01-04 09:32:33"), Price = c(105,104.1, 104.1, 103.9, 104.1, 104, 104.1, 104.1, 104.1, 104), Volume = c(9500L,23500L, 18500L, 12500L, 16118L, 13000L, 2500L, 300L, 500L, 500L), Flag = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L), Ticker = c("0001","0001", "0001", "0001", "0001", "0001", "0001", "0001", "0001","0001")), .Names = c("Time", "Price", "Volume", "Flag", "Ticker"), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
                   Time Price Volume Flag Ticker
 1: 2016-01-04 09:05:06 105.0   9500    1   0001
 2: 2016-01-04 09:20:00 104.1  23500    0   0001
 3: 2016-01-04 09:30:00 104.1  18500    1   0001
 4: 2016-01-04 09:30:01 103.9  12500    0   0001
 5: 2016-01-04 09:30:02 104.1  16118    1   0001
 6: 2016-01-04 09:30:05 104.0  13000    0   0001
 7: 2016-01-04 09:30:06 104.1   2500    1   0001
 8: 2016-01-04 09:30:07 104.1   1500    1   0001
 9: 2016-01-04 09:30:08 104.3    500    1   0001
10: 2016-01-04 09:30:10 104.0   1000    0   0001
11: 2016-01-04 09:30:11 103.9   1000    0   0001
12: 2016-01-04 09:30:15 104.0   3500    1   0001
13: 2016-01-04 09:30:17 104.3   2000    1   0001
14: 2016-01-04 09:30:19 104.3   1500    1   0001
15: 2016-01-04 09:30:20 104.4    500    1   0001
16: 2016-01-04 09:30:21 104.4   1500    1   0001
17: 2016-01-04 09:30:22 104.4   1000    1   0001
18: 2016-01-04 09:30:24 104.4   1500    1   0001
19: 2016-01-04 09:30:25 104.0   2000    0   0001
20: 2016-01-04 09:30:27 104.1   3500    1   0001
21: 2016-01-04 09:30:35 104.0    500    0   0001
22: 2016-01-04 09:31:14 104.1   5000    1   0001
23: 2016-01-04 09:31:15 104.1    500    1   0001
24: 2016-01-04 09:31:18 104.1   2500    1   0001
25: 2016-01-04 09:31:25 104.1   3000    1   0001
26: 2016-01-04 09:31:29 104.0   2000    0   0001
27: 2016-01-04 09:31:30 104.1    500    1   0001
28: 2016-01-04 09:31:35 104.1    300    1   0001
29: 2016-01-04 09:31:38 104.1    500    1   0001
30: 2016-01-04 09:32:33 104.0    500    0   0001

# First get the distribution of the Volume
    distribution <- dt[Flag == 1, sum(Volume), by = cut(Price, 5)][, percentage := list(V1/sum(V1))]
# Get the max range bin
Max_range <- distribution[which.max(percentage), cut]
# Get the Closing price
Closing_price <- dt[.N, Price]
# Check whether the closing price is in the Max_range
Code comes here[?????]

所以问题就出现了:对于具体的Ticker,如何检查收盘价是否在特定范围内?只需要TrueFalse。如果closing_price位于Max_range范围内,则相应的Signal将为True,否则为False

编辑2

添加了所需的输出

所需的输出

   Ticker Signal
1:   0001   False

所以我想创建一个函数来检查Signal是True还是False,然后在data.table中更新。

非常感谢!

2 个答案:

答案 0 :(得分:1)

如果有一个超出给定范围的值,我是否理解你想要找到每个股票代码(001,002等)?

如果出现这个问题,您可以使用dplyr中的group_by函数和逻辑表达式:

group_by(dt,Ticker) %>%
   summarise(Signal=any(with(.,Price>max_price & Price<min_price)))

答案 1 :(得分:0)

range.test对象是levels(range.test)的因子变量:

levels(range.test)
[1] "(0.999,0.9995]" "(0.9995,1]"     "(1,1.0005]"     "(1.0005,1.001]"

当你把它传递给findInterval作为第二个参数时,它被强制转换为数值2,所以这就是结果:

> findInterval(1,2)
[1] 0

应该发生的是因为1小于2.如果你真的想要一个数值序列,范围从0.999到1.001,有5个值,你可以使用seq:

> seq( 0.999,  1.001, length=5)
[1] 0.9990 0.9995 1.0000 1.0005 1.0010

然后,您可以测试该矢量的数量为1.000的区间:

> findInterval( 1, seq( 0.999,  1.001, length=5) )
[1] 3