Question

我正在考虑使用RapidMiner来存储和分析脚本化进程收集的数据集合。有没有办法从命令行脚本将CSV文件导入RapidMiner存储库？

Answer 1

不直接。但您可以使用“Read CSV”运算符创建一个流程，该运算符连接到“Store”运算符并将此流程存储在存储库中。可以从命令行调用此过程。如果文件和存储库位置是静态的并且没有更改，则这就是您需要执行的所有操作。

但要动态指定输入文件和存储库位置，您需要宏。这些宏可以在命令行中设置，但不幸的是只能在RapidMiner版本5.3中使用，该版本目前尚未发布（但将在几周内发布）。在此期间，您可以使用the sourceforge SVN repository (Unuk branch)中的最新版本。

将CSV存储在存储库中的过程：

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
    <process expanded="true" height="190" width="413">
      <operator activated="true" class="read_csv" compatibility="5.3.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="%{csv-file}"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.3.000" expanded="true" height="60" name="Store" width="90" x="179" y="30">
        <parameter key="repository_entry" value="%{repository-location}"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Store" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

假设您已在 // home / steve / csv-to-repository 中保存此进程，并且当前目录是RapidMiner目录，您可以从命令行调用此方法：

./script/rapidminer //home/steve/csv-to-repository "-Mcsv-file=/path/to/your/csv/file" "-Mrepository-location=//repository/path/to/store/csv"

是否可以从命令行将CSV导入RapidMiner存储库？

1 个答案: