我想基于连续行中的值创建一个新列。我有以下数据框,其中主题列中有两个主题,我想将试验列中的每一行与每个主题前面的行进行比较。
df <- data.frame(subject = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
trial = c("switch", "noswitch", "switch", "switch", "noswitch", "switch", "switch", "noswitch", "noswitch", "noswitch"))
我正在尝试使用if语句查看一行,然后根据两个连续行的4种可能配置创建一个新列:switch-switch,noswitch-noswitch,noswitch-switch,switch-noswitch。新列将由这四种配置的名称填充:comp,incomp,noswitch_incomp,switch_comp。循环将重新开始新主题,因此第一个索引将是NA,因为没有先前的值。到目前为止,我有以下内容:
for(i in seq.int(unique(df$subject))){
df$results <- if(df$switch == "switch" & lag(df$switch, 1) == "switch"){
"comp"
} else if (df$switch == "noswitch" & lag(df$switch, 1) == "noswitch"){
"incomp"
} else if (df$switch == "noswitch" & lag(df$switch, 1) == "switch"){
"noswitch_incomp"
} else {
"switch_comp"
}
}
我收到以下错误,我认为这与if语句没有评估其中的参数有关:
Error in if (df$switch == "switch" & lag(df$switch, 1) == "switch") { :
argument is of length zero
我尝试使用带有dplyr的mutate()来匹配条件,但会发生类似的错误。还有其他功能我可以尝试评估这些条件吗?
答案 0 :(得分:1)
您可以使用dplyr::case_when
。
df %>% group_by(subject) %>%
mutate(results = case_when(
trial == 'switch' & lag(trial) == 'switch' ~ 'comp',
trial == 'noswitch' & lag(trial) == 'noswitch' ~ 'incomp',
trial == 'noswitch' & lag(trial) == 'switch' ~ 'noswitch_incomp',
trial == 'switch' & lag(trial) == 'noswitch' ~ 'switch_comp'
))
# # A tibble: 10 x 3
# # Groups: subject [2]
# subject trial results
# <dbl> <chr> <chr>
# 1 1. switch NA
# 2 1. noswitch noswitch_incomp
# 3 1. switch switch_comp
# 4 1. switch comp
# 5 1. noswitch noswitch_incomp
# 6 2. switch NA
# 7 2. switch comp
# 8 2. noswitch noswitch_incomp
# 9 2. noswitch incomp
# 10 2. noswitch incomp