Question

我有一个表，称之为df，有3列，第一个是产品的标题，第二个是产品的描述，第三个是一个单词的字符串。我需要做的是在整个表上运行一个操作，创建2个新列（称为'exists_in_title'和'exists_in_description'），它们具有1或0，表示第3列是否存在于第1列或第2列。我需要它只是一个1：1的操作，所以例如，调用行1'A'，我需要检查单元格A3是否存在于A1中，并使用该数据创建列 exists_in_title，然后检查A2中是否存在A3，并使用该数据创建列exists_in_description。然后转到B行并进行相同的操作。我有数千行数据，所以一次一个地执行这些数据是不现实的，为每一行编写单独的函数，肯定需要一个函数或方法一次性遍历表中的每一行。

我玩过grepl，pmatch，str_count，但似乎没有真正做我需要的东西。我认为grepl可能是最接近我需要的东西，这里是我编写的2行代码的例子，逻辑上做了我想要的代码，但似乎不起作用：

df$exists_in_title <- grepl(df$A3, df$A1)

df$exists_in_description <- grepl(df$A3, df$A2)

然而，当我运行这些时，我收到以下消息，这使我相信它无法正常工作：“参数'模式'的长度> 1，只使用第一个元素”

任何有关如何做到这一点的帮助将不胜感激。谢谢！

Answer 1

grepl将与mapply合作：

示例数据框：

title <- c('eggs and bacon','sausage biscuit','pancakes')
description <- c('scrambled eggs and thickcut bacon','homemade biscuit with breakfast pattie', 'stack of sourdough pancakes')
keyword <- c('bacon','sausage','sourdough')
df <- data.frame(title, description, keyword, stringsAsFactors=FALSE)

使用grepl搜索匹配项：

df$exists_in_title <- mapply(grepl, pattern=df$keyword, x=df$title)
df$exists_in_description <- mapply(grepl, pattern=df$keyword, x=df$description)

结果：

            title                            description   keyword exists_in_title exists_in_description
1  eggs and bacon      scrambled eggs and thickcut bacon     bacon            TRUE                  TRUE
2 sausage biscuit homemade biscuit with breakfast pattie   sausage            TRUE                 FALSE
3        pancakes            stack of sourdough pancakes sourdough           FALSE                  TRUE

更新我

您也可以使用dplyr和stringr执行此操作：

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(exists_in_title = grepl(keyword, title),
         exists_in_description = grepl(keyword, description))

library(stringr)
df %>% 
  rowwise() %>% 
  mutate(exists_in_title = str_detect(title, keyword),
         exists_in_description = str_detect(description, keyword))

更新II

Map也是一个选项，或者使用tidyverse其他选项中的更多选项可以是purrr stringr：

library(tidyverse)
df %>%
  mutate(exists_in_title = unlist(Map(function(x, y) grepl(x, y), keyword, title))) %>% 
  mutate(exists_in_description = map2_lgl(description, keyword,  str_detect))

如何在数据框的其他列中的一列中搜索字符串

1 个答案:

更新我

更新II