Question

我正在阅读我的Google Dataflow程序中的一堆配置文件，并想知道什么是最佳的分阶段方式。目前我这样做，系统找不到它们。

FileReader filereader1 = new FileReader("config_1.csv");
FileReader filereader2 = new FileReader("config_2.csv");

config_1.csv和config_2.csv存储在./target/classes/org/model/examples/

中

我的运行脚本如下所示：

mvn compile exec:java -Dexec.mainClass=org.model.examples.MyPipeline \
-Dexec.args="--runner=DataflowRunner \
    --project=mortgage-data-warehouse
    --gcpTempLocation=gs://my-project-bucket/tmp \
    --inputFile=gs://my-project-bucket/Data/input.txt \
    --filesToStage=./target/classes/org/datamodel/examples/config_1.csv, ./target/classes/org/datamodel/examples/config_2.csv" \    
-Pdataflow-runner

我收到了错误

java.io.FileNotFoundException：config_1.csv（系统找不到指定的文件）

我想知道这是否是设置--filesToStage的正确方法。

Answer 1

对于小型配置文件，最好从资源文件夹中读取文件，例如此link所写的内容，并避免使用--filesToStage的复杂性

如何使用Google Cloud Dataflow暂存其他文件？

1 个答案: