我有一个像这样的字符串向量:
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"))
我有水果的载体:
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
我想要的是带有原始strings
data.frame的data.frame / tibble,其中包含原始列中包含的所有水果的第二个列表或字符列。像这样的东西。
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"),
match = c("apple, orange, plum",
"plum, pear",
"pear")
)
我尝试过str_extract(strings, fruits)
并得到一个列表,其中所有内容都是空白以及警告:
Warning message:
In stri_detect_regex(string, pattern, opts_regex = opts(pattern)):
longer object length is not a multiple of shorter object length
我已经尝试过str_extract_all(strings, paste0(fruits, collapse = "|"))
,并且得到的消息也相同。
我已经看过这个Find matches of a vector of strings in another vector of strings,但这似乎无济于事。
任何帮助将不胜感激。
答案 0 :(得分:2)
这里是一种选择。首先,我们将string
列的每一行拆分为单独的字符串(现在"apple, orange, plum, tomato"
都是一个字符串)。然后,我们将字符串列表与fruits$fruit
列的内容进行比较,并将匹配值的列表存储在新的fruits
列中。
library("tidyverse")
strings <- tibble(
string = c(
"apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"
)
)
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
strings %>%
mutate(str2 = str_split(string, ", ")) %>%
rowwise() %>%
mutate(fruits = list(intersect(str2, fruits$fruit)))
#> Source: local data frame [3 x 3]
#> Groups: <by row>
#>
#> # A tibble: 3 x 3
#> string str2 fruits
#> <chr> <list> <list>
#> 1 apple, orange, plum, tomato <chr [4]> <chr [3]>
#> 2 plum, beat, pear, cactus <chr [4]> <chr [2]>
#> 3 centipede, toothpick, pear, fruit <chr [4]> <chr [1]>
由reprex package(v0.2.0)于2018-08-07创建。
答案 1 :(得分:2)
这是使用purrr的示例
strings <- tibble(string = c("apple, orange, plum, tomato",
"plum, beat, pear, cactus",
"centipede, toothpick, pear, fruit"))
fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))
extract_if_exists <- function(string_to_parse, pattern){
extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
extraction <- unlist(extraction[!(is.na(extraction))])
return(extraction)
}
strings %>%
mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
mutate(matches = map(string, str_c, collapse=", ")) %>%
unnest
答案 2 :(得分:1)
这是base-R解决方案:
strings[["match"]] <-
sapply(
strsplit(strings[["string"]], ", "),
function(x) {
paste(x[x %in% fruits[["fruit"]]], collapse = ", ")
}
)
结果:
string match
<chr> <chr>
1 apple, orange, plum, tomato apple, orange, plum
2 plum, beat, pear, cactus plum, pear
3 centipede, toothpick, pear, fruit pear