我正在尝试将客户评论的数据框归为各自的类别。例如,
x <- data.frame(Reviews = c("The phone performance and display is good","Worth the money","Camera is good"))
所需的输出如下图
我尝试使用R的Quanteda软件包按以下方式创建字典
dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures",
"pixels", "snap"), display = c("resolution", "display", "depth", "mode",
"color", "colour", "discolour"), performance = c("performance", "speed",
"usage", "fast", "run", "running", "lag", "processor", "shut", "shut down",
"restart", "hanging","hang"), Value = c("money", "worth", "budget", "value",
"price", "specs", "specifications", "invest",
"under","expectations","expected","expecting","expect")))
我想根据上述关键字对文本进行分类。请帮助
P.S:dfm是一种选择。但特别是,我想知道如何根据所需的输出对文本的数据框进行分类。
答案 0 :(得分:0)
已经使用了大多数代码:
title = "Temperature (\u00B0C)"
我将大写字母转换为小写,否则,固定的比较不起作用。此外,我建议删除停用词并进行一些蒸煮。
# Creating a DFM and saving the Reviews in a Vector
require("quanteda")
x <- dfm( Reviews <- c(
"The phone performance and display is good",
"Worth the money",
"Camera is good"),
tolower = TRUE)
使用# Creating the dictionary
dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures", "pixels", "snap"),
display = c("resolution", "display", "depth", "mode", "color", "colour", "discolour"),
performance = c("performance", "speed", "usage", "fast", "run", "running", "lag", "processor", "shut", "shut down", "restart", "hanging","hang"),
Value = c("money", "worth", "budget", "value", "price", "specs", "specifications", "invest", "under","expectations","expected","expecting","expect")))
函数:
dfm_lookup
希望这就是您想要的:)