我有3个字符向量列表,例如
list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird"))
我希望以相同的格式创建一个新列表,该列表仅包含上述列表中超过3个字的条目。应在原始列表中删除将放入新列表的条目。 我想要的输出应该是这种形式:
list1 = list(c("sample text", "dumdidum bla bla"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("sample text","cat dog bird"))
newlist = list(c("bla bla bla bla", "a very long text is written in here", "bla bla bla bla", "another very long text"))
是否有可能这样做?
答案 0 :(得分:2)
stringi
库的另一个选项,
library(stringi)
v1 <- unlist(c(list1, list2, list3))
v2 <- v1[stri_count_words(v1) > 3]
v2
#[1] "bla bla bla bla" "a very long text is written in here" "bla bla bla bla" "another very long text"
要从原始列表中删除这些字词,请
lapply(c(list1, list2, list3), function(i) setdiff(i, v2))
给出,
[[1]] [1] "sample text" "dumdidum bla bla" [[2]] [1] "bla " "blubb" [[3]] [1] "sample text" "cat dog bird"
答案 1 :(得分:1)
我将您的数据放入列表中,然后使用lapply
:
data_list <- list(
list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here")),
list2 = list(c("bla ", "blubb")),
list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird")))
data_vec <- unname(unlist(data_list))
data_list <- lapply(data_list,function(x){
keep_ind <- lapply(strsplit(x[[1]]," "),length) <= 3
c(x[[1]][keep_ind])
})
newlist <- data_vec[!data_vec %in% unlist(data_list)]
data_list
#$list1
#[1] "sample text" "dumdidum bla bla"
#
#$list2
#[1] "bla " "blubb"
#
#$list3
#[1] "sample text" "cat dog bird"
newlist
#[1] "bla bla bla bla" "a very long text is written in here"
#[3] "bla bla bla bla" "another very long text"
答案 2 :(得分:0)
我们可以尝试str_count
library(stringr)
list(unlist(lapply(c(list1, list2, list3), function(x) x[str_count(x, "\\w+")>3])))
#[[1]]
#[1] "bla bla bla bla" "a very long text is written in here" "bla bla bla bla" "another very long text"