flink-conf.yaml中fs.hdfs.hadoopconf的用途

时间:2016-07-13 01:47:49

标签: apache-flink

新手到Flink。
我可以在远程hdfs集群中存在的文件上运行示例wordcount.jar,而无需在flink conf中声明fs.hdfs.hadoopconf变量。

所以想知道上面提到的变量究竟是什么目的 声明它会改变运行示例jar的方式吗?

命令:

flink-cluster.vm ~]$ /opt/flink/bin/flink run  /opt/flink/examples/batch/WordCount.jar --input hdfs://hadoop-master:9000/tmp/test-events

输出:

.......
07/13/2016 00:50:13 Job execution switched to status FINISHED.
(foo,1)
.....
(bar,1)
(one,1)

设置:

  • hdfs上的远程HDFS群集:// hadoop-master.vm:9000
  • 在flink-cluster.vm上运行Flink群集

由于

更新
正如Serhiy所指出的,在conf中声明了fs.hdfs.hadoopconf但是在使用更新的参数hdfs:///tmp/test-events.1468374669125运行作业时出现了以下错误

弗林克-conf.yaml

# You can also directly specify the paths to hdfs-default.xml and hdfs-site.xml
# via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
#
fs.hdfs.hadoopconf: hdfs://hadoop-master:9000/
fs.hdfs.hdfsdefault :  hdfs://hadoop-master:9000/

命令:

flink-cluster.vm ~]$ /opt/flink/bin/flink run  /opt/flink/examples/batch/WordCount.jar --input hdfs:///tmp/test-events

输出:

Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: The given HDFS file URI (hdfs:///tmp/test-events.1468374669125) did not describe the HDFS NameNode. The attempt to use a default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' config parameter failed due to the following problem: Either no default file system was registered, or the provided configuration contains no valid authority component (fs.default.name or fs.defaultFS) describing the (hdfs namenode) host and port.
    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:172)
    at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:679)
    at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1026)
    ... 19 more

1 个答案:

答案 0 :(得分:1)

来自documentation

  

fs.hdfs.hadoopconf:Hadoop文件系统的绝对路径   (HDFS)配置目录(可选值)。指定此值   允许程序使用短URI引用HDFS文件   (hdfs:///path/to/files,不包括地址和端口   NameNode在文件URI中)。如果没有此选项,HDFS文件即可   访问,但需要完全限定的URI   hdfs://address:port/path/to/files。此选项也会导致文件   编写器获取HDFS的块大小和默认值   复制因素。 Flink将寻找“core-site.xml”和   “hdfs-site.xml”文件在指定目录中。