如何选择记录取决于PC以减少Rapidminer中的尺寸?

时间:2019-02-16 12:17:23

标签: pca rapidminer dimensionality-reduction

我是Rapidminer中的新手,所以我有一个巨大的数据集,并且我使用主成分分析来降低维数,问题是当我获得PC时,我不知道如何选择记录,我该如何做减少了一个新的数据集?

这是我试图使用的:

这就是我得到的:

1 个答案:

答案 0 :(得分:0)

您可以使用“ PCA权重”运算符来计算属性重要性的权重,然后使用“按权重选择”运算符来减少原始数据集中的属性数。

在下面检查随附的示例过程(只需将XML并入RapidMiner进程窗口)。 也可以随时在RapidMiner community

中浏览或提问

enter image description here

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Root" origin="GENERATED_TUTORIAL">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
  <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Sonar" origin="GENERATED_TUTORIAL" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Samples/data/Sonar"/>
  </operator>
  <operator activated="true" class="weight_by_pca" compatibility="9.2.000" expanded="true" height="82" name="Weight by PCA" width="90" x="313" y="34">
    <parameter key="normalize_weights" value="true"/>
    <parameter key="sort_weights" value="true"/>
    <parameter key="sort_direction" value="ascending"/>
    <parameter key="component_number" value="1"/>
  </operator>
  <operator activated="true" class="select_by_weights" compatibility="9.2.000" expanded="true" height="103" name="Select by Weights" width="90" x="581" y="34">
    <parameter key="weight_relation" value="greater equals"/>
    <parameter key="weight" value="0.5"/>
    <parameter key="k" value="10"/>
    <parameter key="p" value="0.5"/>
    <parameter key="deselect_unknown" value="true"/>
    <parameter key="use_absolute_weights" value="true"/>
  </operator>
  <connect from_op="Sonar" from_port="output" to_op="Weight by PCA" to_port="example set"/>
  <connect from_op="Weight by PCA" from_port="weights" to_op="Select by Weights" to_port="weights"/>
  <connect from_op="Weight by PCA" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
  <connect from_op="Select by Weights" from_port="example set output" to_port="result 1"/>
  <portSpacing port="source_input 1" spacing="0"/>
  <portSpacing port="sink_result 1" spacing="0"/>
  <portSpacing port="sink_result 2" spacing="162"/>
</process>
</operator>
</process>