在Weka中运行哪个版本的SVM?

时间:2012-08-28 13:54:53

标签: weka svm libsvm

我需要在RSS feed中使用Weka的LibSVM实现SVM分类器的频率来将feed分类为目标类别。但我不确定根据我的数据运行哪个版本。

我的.arff文件通常包含以下数据:

@attribute Keyword_1_nasa_Frequency numeric
@attribute Keyword_2_fish_Frequency numeric
@attribute Keyword_3_kill_Frequency numeric
@attribute Keyword_4_show_Frequency numeric
…
@attribute RSSFeedCategoryDescription {BFE,FCL,F,M, NCA, SNT,S}

@data
0,0,0,34,0,0,0,0,0,40,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,12,0,0,0,0,0,20,0,0,0,0,0,0,0,0,0,0,25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,10,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
…
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,FCL
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,F
…
20,0,64,19,0,162,0,0,36,72,179,24,24,47,24,40,0,48,0,0,0,97,24,0,48,205,143,62,7
8,0,0,216,0,36,24,24,0,0,24,0,0,0,0,140,24,0,0,0,0,72,176,0,0,144,48,0,38,0,284,
221,72,0,72,0,SNT
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,SNT
0,0,0,0,0,0,11,0,0,0,0,0,0,0,19,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,17,0,0,0,0,0,0,0,0,0,0,0,0,0,20,0,S

依此类推:总共有570行,其中每行包含 一天中Feed中关键字的频率。在这种情况下,有57个饲料 10天给出总共570条记录。每个关键字都带有前缀 带有代理号,后缀为“频率”。

但在其他情况下,我使用了布尔值作为频率,因此上面的第一行是:

false,false,false,true,false ......,BFE

依此类推,其中34为真,因为满足阈值,而其他则为假,因为未达到阈值。

根据我可以确定的,Weka中有三种类型的SVM,但有人可以告诉我上面的数据应该使用哪种类型?

1 个答案:

答案 0 :(得分:0)

我建议尝试使用所有三种内核类型并确定哪种内核类型最适合您的训练和验证数据(做一个绘图),然后继续使用该训练模型来预测新输入。

在weka中,您可以保存模型以备将来使用。