根据单词数拆分列表

时间:2017-10-27 07:37:39

标签: r list split count

我有3个字符向量列表,例如

list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird"))

我希望以相同的格式创建一个新列表,该列表仅包含上述列表中超过3个字的条目。应在原始列表中删除将放入新列表的条目。 我想要的输出应该是这种形式:

list1 = list(c("sample text", "dumdidum bla bla"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("sample text","cat dog bird"))

newlist = list(c("bla bla bla bla", "a very long text is written in here", "bla bla bla bla", "another very long text"))

是否有可能这样做?

3 个答案:

答案 0 :(得分:2)

stringi库的另一个选项,

library(stringi)

v1 <- unlist(c(list1, list2, list3))
v2 <- v1[stri_count_words(v1) > 3]
v2

#[1] "bla bla bla bla" "a very long text is written in here" "bla bla bla bla"  "another very long text" 

要从原始列表中删除这些字词,请

lapply(c(list1, list2, list3), function(i) setdiff(i, v2))

给出,

[[1]]
[1] "sample text"      "dumdidum bla bla"

[[2]]
[1] "bla "  "blubb"

[[3]]
[1] "sample text"  "cat dog bird"

答案 1 :(得分:1)

我将您的数据放入列表中,然后使用lapply

data_list <- list(
    list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here")),
    list2 = list(c("bla ", "blubb")),
    list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird")))

data_vec <- unname(unlist(data_list))

data_list <- lapply(data_list,function(x){
    keep_ind <- lapply(strsplit(x[[1]]," "),length) <= 3
    c(x[[1]][keep_ind])
})

newlist <- data_vec[!data_vec %in% unlist(data_list)]

data_list
#$list1
#[1] "sample text"      "dumdidum bla bla"
#
#$list2
#[1] "bla "  "blubb"
#
#$list3
#[1] "sample text"  "cat dog bird"

newlist
#[1] "bla bla bla bla"                     "a very long text is written in here"
#[3] "bla bla bla bla"                     "another very long text"  

答案 2 :(得分:0)

我们可以尝试str_count

library(stringr)
list(unlist(lapply(c(list1, list2, list3), function(x) x[str_count(x, "\\w+")>3])))
#[[1]]
#[1] "bla bla bla bla"                     "a very long text is written in here" "bla bla bla bla"                     "another very long text"