如何将多个excel文件导入Rapidminer

时间:2013-04-15 08:25:47

标签: rapidminer

我正在尝试将包含三个excel文件的文件夹立即上传到Rapidminer。 我需要使用哪个运算符(不选择每个运算符并使用read excel运算符)?

1 个答案:

答案 0 :(得分:2)

有一个运算符Loop files,您可以使用它来遍历文件目录。在此运算符的子流程内使用Read Excel运算符。结果是ExampleSets的集合。有多种方法可以处理ExampleSets的集合。对于连接(生成单个ExampleSet),请使用Append运算符。

以下是一个示例流程xml:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.007">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="loop_files" compatibility="5.3.007" expanded="true" height="76" name="Loop Files" width="90" x="782" y="30">
        <parameter key="directory" value="D:\xls"/>
        <parameter key="filter" value="^.*\.xlsx?$"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.007" expanded="true" height="60" name="Read Excel" width="90" x="782" y="30">
            <parameter key="excel_file" value="%{file_path}"/>
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_port="out 1"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="5.3.007" expanded="true" height="76" name="Append" width="90" x="916" y="30"/>
      <connect from_op="Loop Files" from_port="out 1" to_op="Append" to_port="example set 1"/>
      <connect from_op="Append" from_port="merged set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>