让我们考虑一个df
,其中有两列word
和stem
。我想创建一个新列,以检查stem
中是否包含word
中的值,以及该值是在其他字符之前还是之后。最终结果应如下所示:
WORD STEM NEW
rerun run prefixed
runner run suffixed
run run none
... ... ...
在下面,您可以看到到目前为止的代码。但是,由于grepl
表达式应用于df
的所有行,因此它不起作用。无论如何,我认为这应该使我的想法更清楚。
df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))
答案 0 :(得分:2)
您可以使用mapply
每行使用grepl
,例如:
ifelse(mapply(grepl, paste0(".+", x$STEM, ".+"), x$WORD), "both",
ifelse(mapply(grepl, paste0(x$STEM, ".+"), x$WORD), "suffixed",
ifelse(mapply(grepl, paste0(".+", x$STEM), x$WORD), "prefixed", "none")))
#"prefixed" "suffixed" "none"
或者使用startsWith
和endsWith
并使用子集形式向量:
c("none", "both", "prefixed", "suffixed")[1 + (1 + startsWith(x$WORD, x$STEM) +
2*endsWith(x$WORD, x$STEM)) * (nchar(x$WORD) > nchar(x$STEM) &
mapply(grepl, x$STEM, x$WORD))]
#[1] "suffixed" "prefixed" "none"
答案 1 :(得分:1)
您可以像这样创建new
列
df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both')))
或者,在您处于dplyr
管道中的情况下,您要避免所有烦人的df$
df %>%
mutate(new = ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both'))))
输出
# word stem new1
# 1 rerun run prefixed
# 2 runner run suffixed
# 3 run run none
# 4 aruna run both
答案 2 :(得分:1)
这里是str_locate
和stringr
中的dplyr
的一种方法:
library(dplyr)
library(stringr)
data %>%
mutate_at(vars(WORD,STEM), as.character) %>%
mutate(NEW =
case_when(str_locate(WORD,STEM)[,"start"] > 1 &
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
TRUE ~ "none"))
WORD STEM NEW
1 rerun run prefixed
2 runner run suffixed
3 run run none
我添加了一行以将WORD
和STEM
转换为字符,以防万一。