我有一个类似于的数据集:
Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1
我想添加一个新列,当得分差异下降3点时记录为“关闭”,直到获得+3分为止,然后将记录“开”直到出现下降。
示例:
Name, Day, Score, Diff, OnOff
Jain, 1, 8, 0, "Off"
Jain, 2, 6, -2, "Off"
Jain, 3, 8, 2, "Off"
Jain, 4, 12, 4, "On"
Jain, 5, 13, 1, "On"
Jain, 6, 6, -7, "Off"
Matt, 1,4, 0, "Off"
Matt, 2, 10, 6, "On"
Matt, 3, 11, 1, "On"
Matt, 4, 12, 1, "On"
Matt, 5, 5, -7, "Off"
Matt, 6, 6, 1, "Off"
似乎无法弄清楚如何编写此代码。我尝试了以下操作:
df$OnOff <- ifelse(df$Diff >= 3, "On", ifelse(df$Diff <= -3, "Off", ""))
df$OnOff <- ifelse(df$OnOff == "", lag(df$OnOff), df$OnOff)
答案 0 :(得分:2)
这是另一个使用tidyverse
的{{1}}解决方案:
fill
按名称进行操作
library(tidyverse)
df %>%
mutate(
OnOff = case_when(
1:n() == 1 ~ 'Off',
Diff < -2 ~ "Off",
Diff >2 ~ "On",
TRUE ~ NA_character_)
) %>%
fill(OnOff)
答案 1 :(得分:2)
输入更改,然后使用df %>%
group_by(Name) %>%
mutate(
OnOff = case_when(
1:n() == 1 ~ 'Off',
Diff < -2 ~ "Off",
Diff >2 ~ "On",
TRUE ~ NA_character_)
) %>%
fill(OnOff)
(或类似名称)填充空格。调用数据zoo::na.locf
:
dd
您没有在问题中提及分组,但是如果需要,您可以使用dd$OnOff = NA
dd$OnOff[1] = "off"
dd$OnOff[dd$Diff >= 3] = "on"
dd$OnOff[dd$Diff <= -3] = "off"
dd$OnOff = zoo::na.locf(dd$OnOff)
dd
# Name Day Score Diff OnOff
# 1: Jain 1 8 0 off
# 2: Jain 2 6 -2 off
# 3: Jain 3 8 2 off
# 4: Jain 4 12 4 on
# 5: Jain 5 13 1 on
# 6: Jain 6 6 -7 off
# 7: Matt 1 4 0 off
# 8: Matt 2 10 6 on
# 9: Matt 3 11 1 on
# 10: Matt 4 12 1 on
# 11: Matt 5 5 -7 off
# 12: Matt 6 6 1 off
或dplyr
或data.table
来locf
。
要按名称执行操作,您需要将每个名称的第一行设置为默认的Name
。有关'off'
方法的信息,请参见Melissa的解决方案。使用dplyr
时,看起来像这样:
data.table
使用此数据:
setdt(dd)
dd[, OnOff := c('off', rep(NA, .N - 1)), by = Name]
dd[Diff >= 3, OnOff := "on"]
dd[Diff <= -3, OnOff := "off"]
dd[, OnOff := zoo::na.locf(OnOff), by = Name]
答案 2 :(得分:1)
一个人可以编写一个简单的函数,遍历Diff
以比较值,以便在On
和Off
之间进行切换,如下所示:
#Function to decide On/Off logic
getOnOff <- function(x){
lstVal <- "Off"
value <- rep(NA,length(x))
for(i in seq_along(x)){
if(x[i] >= 3){
lstVal = "On"
}else if(x[i] <= -3){
lstVal = "Off"
}
value[i] <- lstVal
}
value
}
#Now use the function with `dplyr` to after grouping on Name
library(dplyr)
df %>% group_by(Name) %>%
mutate(OnOff = getOnOff(Diff))
# # A tibble: 12 x 5
# # Groups: Name [2]
# Name Day Score Diff OnOff
# <chr> <int> <int> <int> <chr>
# 1 Jain 1 8 0 Off
# 2 Jain 2 6 -2 Off
# 3 Jain 3 8 2 Off
# 4 Jain 4 12 4 On
# 5 Jain 5 13 1 On
# 6 Jain 6 6 -7 Off
# 7 Matt 1 4 0 Off
# 8 Matt 2 10 6 On
# 9 Matt 3 11 1 On
# 10 Matt 4 12 1 On
# 11 Matt 5 5 -7 Off
# 12 Matt 6 6 1 Off
选项2::可能OP并不是要打开不同条件的绝对计数,但是如果需要,则可以尝试将cumsum
与dplyr
一起使用。 Diff >= 3
的出现表示计数为up
,而Diff <= -3
表示计数为down
。这些的累积总和将给出可以确定On/Off
的相对计数。
library(dplyr)
df %>% mutate(OnOff = ifelse(cumsum(Diff >= 3) - (cumsum(Diff<= -3))>0, "On","Off"))
# Name Day Score Diff OnOff
# 1 Jain 1 8 0 Off
# 2 Jain 2 6 -2 Off
# 3 Jain 3 8 2 Off
# 4 Jain 4 12 4 On
# 5 Jain 5 13 1 On
# 6 Jain 6 6 -7 Off
# 7 Matt 1 4 0 Off
# 8 Matt 2 10 6 On
# 9 Matt 3 11 1 On
# 10 Matt 4 12 1 On
# 11 Matt 5 5 -7 Off
# 12 Matt 6 6 1 Off
#
数据:
df <- read.table(text="
Name, Day, Score, Diff
Jain, 1, 8, 0
Jain, 2, 6, -2
Jain, 3, 8, 2
Jain, 4, 12, 4
Jain, 5, 13, 1
Jain, 6, 6, -7
Matt, 1,4, 0
Matt, 2, 10, 6
Matt, 3, 11, 1
Matt, 4, 12, 1
Matt, 5, 5, -7
Matt, 6, 6, 1",
header = TRUE, stringsAsFactors = FALSE, sep = ",")