是否可以将关联规则算法应用于文本?
假设我有一个包含多个用户评论的数据库。我想知道经常输入的单词是什么。 (例如:当评论中出现“披萨”一词时,“Domino's”这个词通常也会出现)
关联规则是否适用于此案例?是否有更快的关联规则的替代方案?哪种语言实现它并不重要(但最好是Python或R,也可以使用RapidMiner)。
示例:
comments
1 I ate Domino's pizza and it's the best
2 Yesterday I ate Domino's pizza
3 I like pizza, but not Domino's
结果应该是:
pizza
Domino's (1.0) => strongest association because the word "Domino's" appeared every time the word "pizza" appeared
ate (0.66) => the word "ate" appeared 2/3 of the times the word "pizza" appeared