我们正尝试使用executable jar file
mvn package
java -jar DataFlow-jobs-0.1.jar --tempLocation=gs://events-dataflow/tmp --gcpTempLocation=gs://events-dataflow/tmp --project=google-project-id --runner=DataflowRunner --BQQuery='select t1.user_id google-project-id.deve.user_info t1'
Exception in thread "main" java.lang.IllegalArgumentException: Class interface org.apache.beam.sdk.options.PipelineOptions missing a property named 'gcpTempLocation'.
at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1579)
at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:104)
at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:291)
at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.create(PipelineOptionsFactory.java:270)
at org.customerlabs.beam.WriteFromBQtoES.main(WriteFromBQtoES.java:98)
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<archive>
<manifest>
<mainClass>org.customerlabs.beam.WriteFromBQtoES</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-executable-jar</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
WriteFromBQtoES.java
public class WriteFromBQtoES {
private static DateTimeFormatter fmt =
DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
private static final Logger LOG = LoggerFactory.getLogger(WriteFromBQtoES.class);
private static final ObjectMapper mapper = new ObjectMapper();
public interface Options extends PipelineOptions {
@Description("Bigquery query to fetch data")
@Required
String getBQQuery();
void setBQQuery(String value);
}
public static void main(String[] args) throws IOException{
PipelineOptionsFactory.register(Options.class);
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().create().as(Options.class);
Pipeline p = Pipeline.create(options);
PCollection<TableRow> tableRows = p.apply(BigQueryIO.read().fromQuery(options.getBQQuery()).usingStandardSql());
tableRows.apply("WriteToCSV", ParDo.of(new DoFn<TableRow, String>() {
// process WriteToCSV
}))
p.run();
}
}
public static void main(String[] args) throws IOException{
PipelineOptionsFactory.register(Options.class);
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().create().as(Options.class);
String query = options.getBQQuery();
Pipeline p = Pipeline.create(options);
.....
..... pipeline operations.....
.....
}
我不确定我们缺少什么,我们有这个错误。我们在命令行中传递参数gcpTempLocation。请帮助找出这个问题。提前致谢
答案 0 :(得分:2)
我认为不是你想要的PipelineOptions:
public interface Options extends DataflowPipelineOptions { ... }
gcpTempLocation在GcpOptions.java中定义,并由DataflowPipelineOptions.java扩展。
答案 1 :(得分:0)
我遇到了同样的问题,只是我使用maven shade插件创建了一个uber jar,其中包含了应用程序所需的所有依赖项。使用Apache Beam所需的参数执行jar文件会导致相同的错误,其中找不到-gcpTempLocation。在pom.xml中添加以下代码块将使您能够利用maven shade打包uber jar文件,并解决缺失的params问题。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>${maven-shade-plugin.version}</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<!-- Required to ensure Beam Pipeline options can be passed properly. Without this, pipeline options will not be recognised -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"></transformer>
<!-- add Main-Class to manifest file -->
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>NAME-OF-YOUR-MAIN-CLASS</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
变压器线将确保可以通过命令行参数传递Beam管道选项。将其添加到pom.xml后,运行mvn软件包,这将在root / target中生成一个uber jar文件。之后,您可以使用以下命令执行jar文件:
java -jar target/[your-jar-name].jar \
--runner=org.apache.beam.runners.dataflow.DataflowRunner \
--tempLocation=[GCS temp folder path] \
--stagingLocation=[GCS staging folder path]