Question

我有像mydf <- data.frame(a=c("hihih ojkm hi how","I am fine yuuu dude hwz yo"))这样的句子列表，我想找到那句话中的每个单词都是英文单词。要知道句子中的每个单词是否都是英语单词，我正在使用此代码。 strs <- strsplit(c("hihih ojkm hi how")," ") df <- lapply(strs, is.word) 给出了结果

[[1]]
[1] FALSE FALSE  TRUE  TRUE

现在我想过滤句子以单独提取英语单词，如结果：“hi how”。另外，我想遍历每个句子和每个单词以检查它是否是英文单词并显示正确的句子列表为结果。我是新手。任何指导都会有所帮助

Answer 1

这是你想要的吗？

wordsonly <- function(chars){
    wordchunks <- strsplit(chars," ")#Big assumption that all words are separated by one space.
    wordtest <- lapply(wordchunks,is.word)
    return(wordchunks[[1]][wordtest[[1]]])
}

mydf <- data.frame(a=c("hihih ojkm hi how","I am fine yuuu dude hwz yo"))
mydf$a <- as.character(mydf$a) #just making sure these are strings not factors
mydf$wordsonly <- lapply(mydf$a,wordsonly) #note that each entry in $wordsonly is a list

过滤列表中元素的值

1 个答案: