我有一个包含Time.Interval,Net.Chg和Tick.Count列的数据集。 Net.Chg具有正数,负数和零。基于Net.Chg,我想对Time.Count中的值求正,负和零,然后按日期分组。
Time.Interval Net.Chg Tick.Count
2-Jan-17 NA NA
19:15 - 19:16 -0.0047 7
19:16 - 19:17 0 8
19:17 - 19:18 0.0025 10
3-Jan-17 NA NA
03:45 - 03:46 0 1
03:54 - 03:55 -0.0002 2
19:43 - 19:44 -0.0008 4
20:01 - 20:02 0.0025 2
4-Jan-17 NA NA
00:54 - 00:55 -0.0007 2
01:10 - 01:11 0.0005 1
01:11 - 01:12 0 1
Time.Interval <- c('2-Jan-17 _00:00:00.000000', '19:15 - 19:16', '19:16 - 19:17', '19:17 - 19:18', '3-Jan-17 _00:00:00.000000', '03:45 - 03:46', '03:54 - 03:55', '19:43 - 19:44', '20:01 - 20:02', '4-Jan-17 _00:00:00.000000', '00:54 - 00:55', '01:10 - 01:11', '01:11 - 01:12')
Net.Chg <- c(NA, -0.0047, 0, 0.0025, NA, 0, -0.0002, -0.0008, 0.0025, NA, -0.0007, 0.0005, 0)
Tick.Count <- c(NA, 7, 8, 10, NA, 1, 2, 4, 2, NA, 2, 1, 1)
data <- data.frame(Time.Interval, Net.Chg, Tick.Count)
所需的输出是
pos = sum of "Tick.Count" if Net.Chg > 0
neg = sum of "Tick.Count" if Net.Chg < 0
UnChng = sum of "Tick.Count" if Net.Chg == 0
OF <- pos - Neg
我尝试了以下代码
DF <- dd %>% group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% summarise(Time.Interval = anydate(first(Time.Interval)), pos = sum((Net.Chg > 0)* Tick.Count, na.rm = T), neg = sum((Net.Chg < 0) * Tick.Count, na.rm = T), unChg = sum(Net.Chg ==0 * Tick.Count, na.rm=T), OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))
此代码为我提供了pos
,neg
和'OF'的正确值,但是Unchng
的值是错误的。
当前输出为
Time.Interval pos Neg UnChng OF
02Jan2017 10 7 4 3
03Jan2017 2 6 5 -4
04Jan2017 1 2 4 -1
而实际输出应为
Time.Interval pos Neg UnChng OF
02Jan2017 10 7 8 3
03Jan2017 2 6 1 -4
04Jan2017 1 2 1 -1
我尝试了sum(Net.Chg ==0 + Tick.Count, na.rm=T)
和length(Net.Chg ==0 * Tick.Count)
,但没有成功。
答案 0 :(得分:2)
在比较浮点数时,由于精度错误,切勿使用==
。 R具有all.equal
和identical
之类的功能,或者您可以只检查一些小错误,例如。
DF <- dd %>%
group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>%
summarise(Time.Interval = anydate(first(Time.Interval)),
pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),
neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE),
unChg = sum((abs(Net.Chg)-0 < 1e-15) * Tick.Count, na.rm=TRUE),
OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))
使用T
代替TRUE
也被认为是不好的做法,因为前者可以设置为任何值。
答案 1 :(得分:0)
您需要获取相应的Tick.Count
,其中Net.Chg ==0
和sum
。
library(anytime)
library(tidyverse)
data %>%
group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>%
summarise(Time.Interval = anydate(first(Time.Interval)),
pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),
neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE),
unChg = sum(Tick.Count[Net.Chg ==0], na.rm = TRUE),
OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE)) %>%
ungroup() %>%
select(-grp)
# Time.Interval pos neg unChg OF
# <date> <dbl> <dbl> <dbl> <dbl>
#1 02Jan2017 10 7 8 3
#2 03Jan2017 2 6 1 -4
#3 04Jan2017 1 2 1 -1