通过Apache Beam / Google Cloud DataFlow读取和编写XML文件

时间:2017-08-28 11:57:07

标签: google-cloud-dataflow apache-beam

我尝试按照提供的文档从GCS位置读取XML文件:

  

https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/xml/XmlIO.html

配置中似乎存在一些问题,我错过了一些必要的部分来运行我的代码。我将XML文件保存在GCS位置,并使用下面给出的代码来读取XML文件。

public class XMLReaderWriter {

private static final Logger LOG = LoggerFactory.getLogger(XMLReaderWriter.class);

public static void main(String args[])
{

    DataflowPipelineOptions options=PipelineOptionsFactory.as(DataflowPipelineOptions.class);
     options.setTempLocation("gs://xyz_test/staging");
     options.setProject("test-1-160106");


     Pipeline p=Pipeline.create(options);

    PCollection<Record> result= p.apply(XmlIO.<Record>read()
             .from("gs://xyz_test/sample.xml")
             .withRootElement("catalog")
             .withRecordElement("title")
             .withRecordClass(Record.class));

  result.apply(ParDo.of(new DoFn<Record,String>(){
                @ProcessElement 

                public void processelement(ProcessContext c)
                {
                    System.out.println(c.element().toString());
                }
             })); 
      p.run(); 
}

代码失败,异常,下面是堆栈跟踪的一部分:

Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData$Record.<init>()
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:207)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)

以前有人做过吗?请让我知道我需要实施的代码更改。

0 个答案:

没有答案