r-分配条件连续变量

时间:2018-07-06 18:20:29

标签: r

我有一个类似于的数据集:

Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1

我想添加一个新列,当得分差异下降3点时记录为“关闭”,直到获得+3分为止,然后将记录“开”直到出现下降。

示例:

Name, Day, Score, Diff, OnOff
Jain, 1, 8, 0, "Off"
Jain, 2, 6, -2, "Off"
Jain, 3, 8, 2, "Off"
Jain, 4, 12, 4, "On"
Jain, 5, 13, 1, "On"
Jain, 6, 6, -7, "Off"
Matt, 1,4, 0, "Off"
Matt, 2, 10, 6, "On"
Matt, 3, 11, 1, "On"
Matt, 4, 12, 1, "On"
Matt, 5, 5, -7, "Off"
Matt, 6, 6, 1, "Off"

似乎无法弄清楚如何编写此代码。我尝试了以下操作:

df$OnOff <- ifelse(df$Diff >= 3, "On", ifelse(df$Diff <= -3, "Off", ""))
df$OnOff <- ifelse(df$OnOff == "", lag(df$OnOff), df$OnOff)

3 个答案:

答案 0 :(得分:2)

这是另一个使用tidyverse的{​​{1}}解决方案:

fill

按名称进行操作

library(tidyverse)
df %>%
  mutate(
    OnOff = case_when(
      1:n() == 1 ~ 'Off',
      Diff < -2 ~  "Off",
      Diff >2 ~ "On",
      TRUE ~ NA_character_)
  ) %>%
  fill(OnOff)

答案 1 :(得分:2)

输入更改,然后使用df %>% group_by(Name) %>% mutate( OnOff = case_when( 1:n() == 1 ~ 'Off', Diff < -2 ~ "Off", Diff >2 ~ "On", TRUE ~ NA_character_) ) %>% fill(OnOff) (或类似名称)填充空格。调用数据zoo::na.locf

dd

您没有在问题中提及分组,但是如果需要,您可以使用dd$OnOff = NA dd$OnOff[1] = "off" dd$OnOff[dd$Diff >= 3] = "on" dd$OnOff[dd$Diff <= -3] = "off" dd$OnOff = zoo::na.locf(dd$OnOff) dd # Name Day Score Diff OnOff # 1: Jain 1 8 0 off # 2: Jain 2 6 -2 off # 3: Jain 3 8 2 off # 4: Jain 4 12 4 on # 5: Jain 5 13 1 on # 6: Jain 6 6 -7 off # 7: Matt 1 4 0 off # 8: Matt 2 10 6 on # 9: Matt 3 11 1 on # 10: Matt 4 12 1 on # 11: Matt 5 5 -7 off # 12: Matt 6 6 1 off dplyrdata.tablelocf

要按名称执行操作,您需要将每个名称的第一行设置为默认的Name。有关'off'方法的信息,请参见Melissa的解决方案。使用dplyr时,看起来像这样:

data.table

使用此数据:

setdt(dd)
dd[, OnOff := c('off', rep(NA, .N - 1)), by = Name]
dd[Diff >= 3, OnOff := "on"]
dd[Diff <= -3, OnOff := "off"]
dd[, OnOff := zoo::na.locf(OnOff), by = Name]

答案 2 :(得分:1)

一个人可以编写一个简单的函数,遍历Diff以比较值,以便在OnOff之间进行切换,如下所示:

#Function to decide On/Off logic
getOnOff <- function(x){
  lstVal <- "Off"
  value <- rep(NA,length(x))
  for(i in seq_along(x)){
    if(x[i] >= 3){
      lstVal = "On"
    }else if(x[i] <= -3){
      lstVal = "Off"
    }
    value[i] <- lstVal
  }
  value
}

#Now use the function with `dplyr` to after grouping on Name

library(dplyr)

df %>% group_by(Name) %>%
  mutate(OnOff = getOnOff(Diff))


# # A tibble: 12 x 5
# # Groups: Name [2]
# Name    Day Score  Diff OnOff
# <chr> <int> <int> <int> <chr>
# 1 Jain      1     8     0 Off  
# 2 Jain      2     6    -2 Off  
# 3 Jain      3     8     2 Off  
# 4 Jain      4    12     4 On   
# 5 Jain      5    13     1 On   
# 6 Jain      6     6    -7 Off  
# 7 Matt      1     4     0 Off  
# 8 Matt      2    10     6 On   
# 9 Matt      3    11     1 On   
# 10 Matt      4    12     1 On   
# 11 Matt      5     5    -7 Off  
# 12 Matt      6     6     1 Off  

选项2::可能OP并不是要打开不同条件的绝对计数,但是如果需要,则可以尝试将cumsumdplyr一起使用。 Diff >= 3的出现表示计数为up,而Diff <= -3表示计数为down。这些的累积总和将给出可以确定On/Off的相对计数。

library(dplyr)

df %>% mutate(OnOff = ifelse(cumsum(Diff >= 3) - (cumsum(Diff<= -3))>0, "On","Off"))

#    Name Day Score Diff OnOff
# 1  Jain   1     8    0   Off
# 2  Jain   2     6   -2   Off
# 3  Jain   3     8    2   Off
# 4  Jain   4    12    4    On
# 5  Jain   5    13    1    On
# 6  Jain   6     6   -7   Off
# 7  Matt   1     4    0   Off
# 8  Matt   2    10    6    On
# 9  Matt   3    11    1    On
# 10 Matt   4    12    1    On
# 11 Matt   5     5   -7   Off
# 12 Matt   6     6    1   Off
# 

数据:

df <- read.table(text="
Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1",
header = TRUE, stringsAsFactors = FALSE, sep = ",")