Apache Beam - 字数统计示例不起作用

时间:2017-11-07 16:36:25

标签: java apache apache-beam

我正在尝试为Google Cloud制作自己的DataFlow运行器。 所以首先我要尝试在我的计算机上本地执行此操作。 我尝试使用this但是当我尝试运行WordCount时,我得到:

C:\Users\XXX\Documents\Test-Beam-3\word-count-beam>mvn compile exec:ja
va -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=
pom.xml --output=counts" -Pdirect-runner
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building word-count-beam 0.1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ word-count
-beam ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources,
i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory C:\Users\XXX\Documents\Test
-Beam-3\word-count-beam\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.6.1:compile (default-compile) @ word-count-be
am ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding Cp1252, i.e. b
uild is platform dependent!
[INFO] Compiling 21 source files to C:\Users\XXX\Documents\Test-Beam-3
\word-count-beam\target\classes
[INFO] /C:/Users/XXX/Documents/Test-Beam-3/word-count-beam/src/main/ja
va/org/apache/beam/examples/complete/game/utils/WriteToText.java: C:\Users\aalfe
rezaroca\Documents\Test-Beam-3\word-count-beam\src\main\java\org\apache\beam\exa
mples\complete\game\utils\WriteToText.java uses unchecked or unsafe operations.
[INFO] /C:/Users/XXX/Documents/Test-Beam-3/word-count-beam/src/main/ja
va/org/apache/beam/examples/complete/game/utils/WriteToText.java: Recompile with
 -Xlint:unchecked for details.
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
Nov 07, 2017 10:25:17 AM org.apache.beam.sdk.io.FileBasedSource getEstimatedSize
Bytes
INFO: Filepattern pom.xml matched 1 files with total size 14039
Nov 07, 2017 10:25:17 AM org.apache.beam.sdk.io.FileBasedSource expandFilePatter
n
INFO: Matched 1 files for pattern pom.xml
Nov 07, 2017 10:25:17 AM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern pom.xml into bundles of size 3509 took 11 ms and pro
duced 1 files and 4 bundles
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-count-beam\.temp-beam-2017-11-311_16
-25-17-1\, windowedWrites=false}
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-count-beam\.temp-beam-2017-11-311_16
-25-17-1\, windowedWrites=false}
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-count-beam\.temp-beam-2017-11-311_16
-25-17-1\, windowedWrites=false}
Nov 07, 2017 10:25:20 AM org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles p
rocessElement
INFO: Opening writer for write operation TextWriteOperation{tempDirectory=C:\Use
rs\XXX\Documents\Test-Beam-3\word-coun[t-beam\.temp-beam-2017-11-311_1
6-25-17-1\, windowedWrites=false}
WARNING]
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.jav
a:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessor
Impl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:293)
    at java.lang.Thread.run (Thread.java:748)
Caused by: org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.Il
legalStateException: Unable to find registrar for c
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUnti
lFinish (DirectRunner.java:331)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUnti
lFinish (DirectRunner.java:301)
    at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:200)
    at org.apache.beam.runners.direct.DirectRunner.run (DirectRunner.java:63)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:297)
    at org.apache.beam.sdk.Pipeline.run (Pipeline.java:283)
    at org.apache.beam.examples.WordCount.main (WordCount.java:185)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.jav
a:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessor
Impl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:293)
    at java.lang.Thread.run (Thread.java:748)
Caused by: java.lang.IllegalStateException: Unable to find registrar for c
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal (FileSystems.jav
a:447)
    at org.apache.beam.sdk.io.FileSystems.match (FileSystems.java:111)
    at org.apache.beam.sdk.io.FileSystems.matchResources (FileSystems.java:174)
    at org.apache.beam.sdk.io.FileSystems.delete (FileSystems.java:321)
    at org.apache.beam.sdk.io.FileBasedSink$Writer.cleanup (FileBasedSink.java:9
05)
    at org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.processElement (Wri
teFiles.java:438)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.965 s
[INFO] Finished at: 2017-11-07T10:25:21-06:00
[INFO] Final Memory: 36M/647M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:java (d
efault-cli) on project word-count-beam: An exception occured while executing the
 Java class. null: InvocationTargetException: java.lang.IllegalStateException: U
nable to find registrar for c -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception

C:\Users\XXX\Documents\Test-Beam-3\word-count-beam>

我尝试更改

中的数据来源
  

GS://apache-beam-samples/shakespeare/kinglear.txt

  

C:\ The Hunger Games.txt

但没有。我首先想到的是防火墙/代理/网络相关的一些问题。 崩溃发生在WordCount.java第184行:

p.run().waitUntilFinish();

我很惊讶这不是开箱即用的,因为这应该是一个例子。

任何提示? 有人有这个问题吗?

编辑:

我发现someone said这是Windows OS上与路径相关的问题。 我使用谷歌云存储(gs),但似乎代码使用了一些本地路径,导致此崩溃。这是一段时间以前,所以我拒绝相信这个问题仍未解决。

2 个答案:

答案 0 :(得分:0)

我有点困惑:您说您的输入来自gs://apache-beam-samples/shakespeare/kinglear.txt,但您的调用显示您正在使用-Dexec.args="--inputFile=pom.xml --output=counts"运行该程序,并且确实根据其记录输出它正在读取您的pom.xml文件并计算其中的单词。你在哪里指定kinglear.txt路径?

尽管如此,它应该至少成功计算pom.xml中的单词。我认为这个Windows兼容性问题已在HEAD修复 - 请参阅相应的JIRA https://issues.apache.org/jira/browse/BEAM-2298

答案 1 :(得分:0)

我们可以将输出存储在CloudSql中吗?如果是,请提供步骤/过程

仅供参考: 我能够按照此将输出存储在云存储中 链接https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven

mvn -Pdataflow-runner compile exec:java \
      -Dexec.mainClass=org.apache.beam.examples.WordCount \
      -Dexec.args="--project=<project_id> \
      --stagingLocation=gs://<bucket>/staging/ \
      --output=gs://<bucket>/output \
      --runner=DataflowRunner"