基于单词列表R

时间:2019-02-06 22:54:04

标签: r text-classification stringr

我有一个数据集,其中包含要根据匹配词进行分类的文章标题和摘要。

“这是我要基于列表中匹配的单词进行分类的文本示例。这大约需要2-3个句子。word4,word5,文本,文本,文本”

Topic 1     Topic 2     Topic (X)
word1       word4       word(a)
word2       word5       word(b)
word3       word6       word(c)

鉴于上面的文本与主题2中的单词匹配,我想为此标签分配一个新列。如果可以通过“ tidy-verse”软件包来完成,则是首选。

1 个答案:

答案 0 :(得分:0)

给出句子作为字符串和数据框中的主题,您可以执行以下操作

input<- c("This is an example of text that I want to classify based on the words that are matched from a list. This would be about 2 - 3 sentences long. word4, word5, text, text, text")
df <- data.frame(Topic1 = c("word1", "word2", "word3"),Topic2 = c("word4", "word5", "word6"))

## This splits on space and punctation (only , and .)
input<-unlist(strsplit(input, " |,|\\."))

newcol <- paste(names(df)[apply(df,2, function(x) sum(input %in% x) > 0)], collapse=", ")

鉴于我也不确定要添加的数据帧,所以我制作了矢量newcol。

如果您的句子句子较长,则可以使用类似的方法。

inputdf<- data.frame(title=c("This is an example of text that I want to classify based on the words that are matched from a list. This would be about 2 - 3 sentences long. word4, word5, text, text, text", "word2", "word3, word4"))
input <- strsplit(as.character(inputdf$title), " |,|\\.")

inputdf$newcolmn <-unlist(lapply(input, function(x) paste(names(df)[apply(df,2, function(y) sum(x %in% y)>0)], collapse = ", ")))