如何在一个列中添加在另一列中具有“ 0”的值

时间:2019-06-06 01:01:10

标签: r dplyr

我有一个包含Time.Interval,Net.Chg和Tick.Count列的数据集。 Net.Chg具有正数,负数和零。基于Net.Chg,我想对Time.Count中的值求正,负和零,然后按日期分组。

Time.Interval   Net.Chg    Tick.Count
2-Jan-17         NA        NA
19:15 - 19:16       -0.0047    7
19:16 - 19:17    0     8
19:17 - 19:18    0.0025    10
3-Jan-17         NA        NA
03:45 - 03:46    0     1
03:54 - 03:55   -0.0002    2
19:43 - 19:44   -0.0008    4
20:01 - 20:02    0.0025    2
4-Jan-17         NA        NA
00:54 - 00:55   -0.0007    2
01:10 - 01:11    0.0005    1
01:11 - 01:12    0     1
Time.Interval <- c('2-Jan-17 _00:00:00.000000', '19:15 - 19:16', '19:16 - 19:17', '19:17 - 19:18', '3-Jan-17 _00:00:00.000000', '03:45 - 03:46', '03:54 - 03:55', '19:43 - 19:44', '20:01 - 20:02', '4-Jan-17 _00:00:00.000000', '00:54 - 00:55', '01:10 - 01:11', '01:11 - 01:12')
Net.Chg <- c(NA, -0.0047, 0, 0.0025, NA, 0, -0.0002, -0.0008, 0.0025, NA, -0.0007, 0.0005, 0)
Tick.Count <-  c(NA, 7, 8, 10, NA, 1, 2, 4, 2, NA, 2, 1, 1)
data <- data.frame(Time.Interval, Net.Chg, Tick.Count)

所需的输出是

pos = sum of "Tick.Count" if Net.Chg > 0
neg = sum of "Tick.Count" if Net.Chg < 0
UnChng = sum of "Tick.Count" if Net.Chg == 0
OF <- pos - Neg

我尝试了以下代码

DF <- dd %>% group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% summarise(Time.Interval = anydate(first(Time.Interval)), pos = sum((Net.Chg > 0)* Tick.Count, na.rm = T),  neg = sum((Net.Chg < 0) * Tick.Count, na.rm = T), unChg = sum(Net.Chg ==0 * Tick.Count, na.rm=T), OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))

此代码为我提供了posneg和'OF'的正确值,但是Unchng的值是错误的。

当前输出为

Time.Interval      pos    Neg     UnChng     OF
02Jan2017          10     7       4           3      
03Jan2017          2      6       5          -4
04Jan2017          1      2       4          -1

而实际输出应为

Time.Interval      pos    Neg     UnChng     OF
02Jan2017          10     7       8           3      
03Jan2017          2      6       1          -4
04Jan2017          1      2       1          -1

我尝试了sum(Net.Chg ==0 + Tick.Count, na.rm=T)length(Net.Chg ==0 * Tick.Count),但没有成功。

2 个答案:

答案 0 :(得分:2)

在比较浮点数时,由于精度错误,切勿使用==。 R具有all.equalidentical之类的功能,或者您可以只检查一些小错误,例如。

DF <- dd %>% 
    group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% 
    summarise(Time.Interval = anydate(first(Time.Interval)), 
        pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),  
        neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE), 
        unChg = sum((abs(Net.Chg)-0 < 1e-15) * Tick.Count, na.rm=TRUE), 
        OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE))

使用T代替TRUE也被认为是不好的做法,因为前者可以设置为任何值。

答案 1 :(得分:0)

您需要获取相应的Tick.Count,其中Net.Chg ==0sum

library(anytime)
library(tidyverse)

data %>% 
  group_by(grp = cumsum(str_detect(Time.Interval, "[A-Z]"))) %>% 
  summarise(Time.Interval = anydate(first(Time.Interval)), 
            pos = sum((Net.Chg > 0)* Tick.Count, na.rm = TRUE),  
            neg = sum((Net.Chg < 0) * Tick.Count, na.rm = TRUE), 
            unChg = sum(Tick.Count[Net.Chg ==0], na.rm = TRUE), 
            OF = sum(sign(Net.Chg) * Tick.Count, na.rm = TRUE)) %>%
  ungroup() %>%
  select(-grp)

#  Time.Interval   pos   neg unChg    OF
#  <date>        <dbl> <dbl> <dbl> <dbl>
#1 02Jan2017       10     7     8     3
#2 03Jan2017        2     6     1    -4
#3 04Jan2017        1     2     1    -1