我是数据挖掘和学习快速通道的新手。我需要为我正在做的项目实现SVM。然而,我被困住了,因为无论SVM它只是运行了几个小时而且不知道它是否接近完成。
我已经使用Relieff过滤器和前向选择包装器删除了尽可能多的功能,我使用的线性Karnel应该是最快的,SVM的C为0.数据集本身为3950个对象,有14个维度,我不要以为它很多。
我能想到花费这么多时间的唯一原因是我正在使用10交叉验证,但即便如此,它也不需要花费数天时间。 所以我的问题是:
1 - 看到我如何在示例中实现我的svm是否有任何我可以改变以减少运行时间?
2 - 在快速通道中,有什么方法可以看到SVM中发生了什么,看看为什么需要这么长时间?或者至少检查交叉验证的哪个迭代?
过程本身已使用预处理后的文件(我无法共享数据集)如下所示:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<parameter key="logfile" value="D:\testexrff.xrff"/>
<process expanded="true">
<operator activated="true" class="read_xrff" compatibility="5.3.008" expanded="true" height="60" name="Read XRFF (4)" width="90" x="45" y="165">
<parameter key="data_file" value="C:\Users\glintthssig\Desktop\wrapper"/>
</operator>
<operator activated="true" class="x_validation" compatibility="5.3.008" expanded="true" height="112" name="Validation (11)" width="90" x="246" y="120">
<parameter key="use_local_random_seed" value="true"/>
<process expanded="true">
<operator activated="true" class="remap_binominals" compatibility="5.3.008" expanded="true" height="76" name="Remap Binominals (5)" width="90" x="45" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="REINTERNAMENTO"/>
<parameter key="negative_value" value="N"/>
<parameter key="positive_value" value="S"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="45" y="165">
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.008" expanded="true" height="76" name="SVM (2)" width="90" x="179" y="165">
<parameter key="kernel_type" value="linear"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="Remap Binominals (5)" to_port="example set input"/>
<connect from_op="Remap Binominals (5)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="SVM (2)" to_port="training set"/>
<connect from_op="SVM (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="remap_binominals" compatibility="5.3.008" expanded="true" height="76" name="Remap Binominals (8)" width="90" x="45" y="165">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="REINTERNAMENTO"/>
<parameter key="negative_value" value="N"/>
<parameter key="positive_value" value="S"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical (4)" width="90" x="179" y="165">
<list key="comparison_groups"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model (11)" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (11)" width="90" x="212" y="30"/>
<connect from_port="model" to_op="Apply Model (11)" to_port="model"/>
<connect from_port="test set" to_op="Remap Binominals (8)" to_port="example set input"/>
<connect from_op="Remap Binominals (8)" from_port="example set output" to_op="Nominal to Numerical (4)" to_port="example set input"/>
<connect from_op="Nominal to Numerical (4)" from_port="example set output" to_op="Apply Model (11)" to_port="unlabelled data"/>
<connect from_op="Apply Model (11)" from_port="labelled data" to_op="Performance (11)" to_port="labelled data"/>
<connect from_op="Performance (11)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read XRFF (4)" from_port="output" to_op="Validation (11)" to_port="training"/>
<connect from_op="Validation (11)" from_port="averagable 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
答案 0 :(得分:0)
这个过程看起来不错。
目前有一个奇怪的问题,有时可以通过使用Materialize Data
运算符来修复。把它放在交叉验证的内部,我建议在SVM操作员之前。
它要么神奇地工作要么如果没有,那么我们将不得不求助于其他事情。