使用应用函数将句子向量转换为单词向量

时间:2015-03-23 23:00:44

标签: r apply word sentence

在R中,我有这个句子向量,我想将其转换为单词向量。我怎么能用apply函数来做呢?

test.sentences <- c("boy who boys see lives .",
                    "cats who Mary feeds hear .",
                    "girls who see see John .",
                    "John hears dogs .",
                    "John lives .",
                    "Mary hears cat .",
                    "boys who Mary chases see girl .",
                    "dog who John sees feeds Mary .",
                    "girls feed cats who see .",
                    "Mary chases girls who Mary chases .",
                    "Mary hears .",
                    "boy who hears cats walks .",
                    "girl who dog sees feeds boy .",
                    "Mary lives .",
                    "Mary sees boy .",
                    "cat who walks lives .",
                    "Mary sees girl who chases John .",
                    "John chases boys who boy hears .",
                    "cats hear boy who feeds boys .",
                    "girls who hear see cats who hear .",
                    "girls who cats feed chase John .",
                    "cat lives .",
                    "cats live ." )

4 个答案:

答案 0 :(得分:2)

您无需使用任何*apply()函数来执行此操作。这是使用stringi包的一种非常简单有效的方法。

stringi::stri_extract_all_words(test.sentences)

这将返回一个列表,test.sentences中每个元素的一个元素,其中句点(.)已被删除。对于原子矢量,只需将其包装在unlist()中。对于矩阵,请使用simplify = TRUE

答案 1 :(得分:2)

在基地R:

res <- unlist(strsplit(test.sentences," "))
res[res != "."]

 unlist(strsplit(gsub("\\.","",test.sentences)," "))

答案 2 :(得分:2)

这是 qdap 方法(我维护):

library(qdap)
lapply(test.sentences, bag_o_words)

或者作为单个载体:

bag_o_words(test.sentences)

答案 3 :(得分:0)

您是否尝试了类似do.call的内容,您可以尝试这一点,不确定它是否会适用于您的情况:

test.sentences <- c("boy who boys see lives .",
                    "cats who Mary feeds hear .",
                    "girls who see see John .",
                    "John hears dogs .",
                    "John lives .",
                    "Mary hears cat .",
                    "boys who Mary chases see girl .",
                    "dog who John sees feeds Mary .",
                    "girls feed cats who see .",
                    "Mary chases girls who Mary chases .",
                    "Mary hears .",
                    "boy who hears cats walks .",
                    "girl who dog sees feeds boy .",
                    "Mary lives .",
                    "Mary sees boy .",
                    "cat who walks lives .",
                    "Mary sees girl who chases John .",
                    "John chases boys who boy hears .",
                    "cats hear boy who feeds boys .",
                    "girls who hear see cats who hear .",
                    "girls who cats feed chase John .",
                    "cat lives .",
                    "cats live ." )
vector_of_words <- do.call(rbind, strsplit(as.character(test.sentences), " "))
test <- cbind(test.sentences, vector_of_words)