Question

我保存了100个结果的Google查询（标题和说明）。它有这种格式：

Title                Description
Spain - Wikipedia    Spain is a democracy organised in the form of a parliamentary government under a constitutional monarchy. It is a developed country with the world's fourteenth

你明白了。我成功将此CSV文件加载到weka中。首先应用NominalToString过滤器（因为它以Nominal加载）。然后使用以下选项应用StringToWordVector：

IDFTransform - True
TFTTransform - T
normalaize - T
outputWordCounts - T
tokenizer - Alphabetical
WordstoKeep - 100

或多或少。然后我得到一个单词列表，有时我使用NGramTokenizer至少有3个单词。

之后我转到Cluster并选择K-means。这不能很好地工作，因为它将90％放在一个集群中。或许这是对的......

当我选择使用训练集时，会发生什么，因为我还没有任何东西？我应该使用什么选项？我想形成像类别（旅游，体育，经济，......）的集群。 Weka可以像Carrot2那样做吗？或至少形成集群。

感谢。

与weka聚类

0 个答案: