Question

我的数据框由一个包含多个单词的变量组成，例如：

variable

"hello my name is this"

"greetings friend"

另一个由两列组成的数据框，其中一列是单词，另一列是这些单词的替换，例如：

word

"hello"

"greetings"

替换：

replacement

"hi"

"hi"

我试图找到一种简单的方法来替换＆＃34;变量＆＃34;用替换词，循环所有观察，以及每个观察中的所有单词。期望的结果是：

variable

"hi my name is this"

"hi friend"

我已经研究了一些使用cSplit的方法，但它对我的应用程序来说是不可行的（在＃34;变量＆＃34;的任何给定观察中都有太多单词，所以这会创建太多列）。我不确定如何使用strsplit，但我猜这是正确的选择吗？

编辑：根据我对这个问题的理解，我的问题是重复以前未回答的问题：Replace strings in text based on dictionary

Answer 1

在这种情况下，

stringr的{{1}}会很方便：

str_replace_all

输出：

df = data.frame(variable = c('hello my name is this','greetings friend'))

replacement <- data.frame(word = c('hello','greetings'), replacment = c('hi','hi'), stringsAsFactors = F)

stringr::str_replace_all(df$variable,replacement$word,replacement$replacment)

Answer 2

这类似于@ amrrs的解决方案，但我使用的是命名向量，而不是提供两个单独的向量。这也解决了OP在评论中提到的问题：

library(dplyr)
library(stringr)

df2$word %>%
  paste0("\\b", ., "\\b") %>%
  setNames(df2$replacement, .) %>%
  str_replace_all(df1$variable, .)

# [1] "hi my name is this"        "hi friend"                 "hi, hellomy is not a word"
# [4] "hi! my friend"

这是带有正则表达式的命名向量，作为要替换为元素的名称和字符串：

df2$word %>%
  paste0("\\b", ., "\\b") %>%
  setNames(df2$replacement, .) 
# \\bhello\\b \\bgreetings\\b 
#        "hi"            "hi"

数据：

df1 = data.frame(variable = c('hello my name is this', 'greetings friend', 'hello, hellomy is not a word', 'greetings! my friend')) df2 = data.frame(word = c('hello','greetings'), replacement = c('hi','hi'), stringsAsFactors = F)

注意：

为了解决也被转换的根词的问题，我用正则边界（\\b）包装正则表达式。这可以确保我不会转换生活在另一个内部的单词，例如“helloguys”。

循环并替换数据框中的文本

2 个答案: