减少Rapidminer中SVM的计算时间

时间:2014-08-19 14:48:30

标签: machine-learning classification svm rapidminer

我是数据挖掘和学习快速通道的新手。我需要为我正在做的项目实现SVM。然而,我被困住了,因为无论SVM它只是运行了几个小时而且不知道它是否接近完成。

我已经使用Relieff过滤器和前向选择包装器删除了尽可能多的功能,我使用的线性Karnel应该是最快的,SVM的C为0.数据集本身为3950个对象,有14个维度,我不要以为它很多。

我能想到花费这么多时间的唯一原因是我正在使用10交叉验证,但即便如此,它也不需要花费数天时间。 所以我的问题是:

1 - 看到我如何在示例中实现我的svm是否有任何我可以改变以减少运行时间?

2 - 在快速通道中,有什么方法可以看到SVM中发生了什么,看看为什么需要这么长时间?或者至少检查交叉验证的哪个迭代?

过程本身已使用预处理后的文件(我无法共享数据集)如下所示:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <parameter key="logverbosity" value="all"/>
    <parameter key="logfile" value="D:\testexrff.xrff"/>
    <process expanded="true">
      <operator activated="true" class="read_xrff" compatibility="5.3.008" expanded="true" height="60" name="Read XRFF (4)" width="90" x="45" y="165">
        <parameter key="data_file" value="C:\Users\glintthssig\Desktop\wrapper"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.3.008" expanded="true" height="112" name="Validation (11)" width="90" x="246" y="120">
        <parameter key="use_local_random_seed" value="true"/>
        <process expanded="true">
          <operator activated="true" class="remap_binominals" compatibility="5.3.008" expanded="true" height="76" name="Remap Binominals (5)" width="90" x="45" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="REINTERNAMENTO"/>
            <parameter key="negative_value" value="N"/>
            <parameter key="positive_value" value="S"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="45" y="165">
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.008" expanded="true" height="76" name="SVM (2)" width="90" x="179" y="165">
            <parameter key="kernel_type" value="linear"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="training" to_op="Remap Binominals (5)" to_port="example set input"/>
          <connect from_op="Remap Binominals (5)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="SVM (2)" to_port="training set"/>
          <connect from_op="SVM (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="remap_binominals" compatibility="5.3.008" expanded="true" height="76" name="Remap Binominals (8)" width="90" x="45" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="REINTERNAMENTO"/>
            <parameter key="negative_value" value="N"/>
            <parameter key="positive_value" value="S"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical (4)" width="90" x="179" y="165">
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model (11)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (11)" width="90" x="212" y="30"/>
          <connect from_port="model" to_op="Apply Model (11)" to_port="model"/>
          <connect from_port="test set" to_op="Remap Binominals (8)" to_port="example set input"/>
          <connect from_op="Remap Binominals (8)" from_port="example set output" to_op="Nominal to Numerical (4)" to_port="example set input"/>
          <connect from_op="Nominal to Numerical (4)" from_port="example set output" to_op="Apply Model (11)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (11)" from_port="labelled data" to_op="Performance (11)" to_port="labelled data"/>
          <connect from_op="Performance (11)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read XRFF (4)" from_port="output" to_op="Validation (11)" to_port="training"/>
      <connect from_op="Validation (11)" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

1 个答案:

答案 0 :(得分:0)

这个过程看起来不错。

目前有一个奇怪的问题,有时可以通过使用Materialize Data运算符来修复。把它放在交叉验证的内部,我建议在SVM操作员之前。

它要么神奇地工作要么如果没有,那么我们将不得不求助于其他事情。