如何使用R中的关键字将字符的数据框分类为类别?

时间:2018-08-11 03:08:06

标签: r nlp classification keyword

我正在尝试将客户评论的数据框归为各自的类别。例如,

x <- data.frame(Reviews = c("The phone performance and display is good","Worth the money","Camera is good"))

所需的输出如下图

Please click for image

我尝试使用R的Quanteda软件包按以下方式创建字典

dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures", 
"pixels", "snap"), display = c("resolution", "display", "depth", "mode", 
"color", "colour", "discolour"), performance = c("performance", "speed", 
"usage", "fast", "run", "running", "lag", "processor", "shut", "shut down", 
"restart", "hanging","hang"), Value = c("money", "worth", "budget", "value", 
"price", "specs", "specifications", "invest", 
"under","expectations","expected","expecting","expect")))

我想根据上述关键字对文本进行分类。请帮助

P.S:dfm是一种选择。但特别是,我想知道如何根据所需的输出对文本的数据框进行分类。

1 个答案:

答案 0 :(得分:0)

已经使用了大多数代码:

title =  "Temperature (\u00B0C)"

我将大写字母转换为小写,否则,固定的比较不起作用。此外,我建议删除停用词并进行一些蒸煮。

# Creating a DFM and saving the Reviews in a Vector
require("quanteda")
x <- dfm( Reviews <- c(
        "The phone performance and display is good",
        "Worth the money",
        "Camera is good"),
          tolower = TRUE)

使用# Creating the dictionary dic <- dictionary(list(camera = c("camera","lens","pixel", "pictures", "pixels", "snap"), display = c("resolution", "display", "depth", "mode", "color", "colour", "discolour"), performance = c("performance", "speed", "usage", "fast", "run", "running", "lag", "processor", "shut", "shut down", "restart", "hanging","hang"), Value = c("money", "worth", "budget", "value", "price", "specs", "specifications", "invest", "under","expectations","expected","expecting","expect"))) 函数:

dfm_lookup

希望这就是您想要的:)