Question

新手到Flink。
我可以在远程hdfs集群中存在的文件上运行示例wordcount.jar，而无需在flink conf中声明fs.hdfs.hadoopconf变量。

所以想知道上面提到的变量究竟是什么目的声明它会改变运行示例jar的方式吗？

命令：

flink-cluster.vm ~]$ /opt/flink/bin/flink run  /opt/flink/examples/batch/WordCount.jar --input hdfs://hadoop-master:9000/tmp/test-events

输出：

.......
07/13/2016 00:50:13 Job execution switched to status FINISHED.
(foo,1)
.....
(bar,1)
(one,1)

设置：

hdfs上的远程HDFS群集：// hadoop-master.vm：9000
在flink-cluster.vm上运行Flink群集

由于

更新：
正如Serhiy所指出的，在conf中声明了fs.hdfs.hadoopconf但是在使用更新的参数hdfs:///tmp/test-events.1468374669125运行作业时出现了以下错误

弗林克-conf.yaml

# You can also directly specify the paths to hdfs-default.xml and hdfs-site.xml
# via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
#
fs.hdfs.hadoopconf: hdfs://hadoop-master:9000/
fs.hdfs.hdfsdefault :  hdfs://hadoop-master:9000/

命令：

flink-cluster.vm ~]$ /opt/flink/bin/flink run  /opt/flink/examples/batch/WordCount.jar --input hdfs:///tmp/test-events

输出：

Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: The given HDFS file URI (hdfs:///tmp/test-events.1468374669125) did not describe the HDFS NameNode. The attempt to use a default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' config parameter failed due to the following problem: Either no default file system was registered, or the provided configuration contains no valid authority component (fs.default.name or fs.defaultFS) describing the (hdfs namenode) host and port.
    at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:172)
    at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:679)
    at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1026)
    ... 19 more

Answer 1

来自documentation：

fs.hdfs.hadoopconf：Hadoop文件系统的绝对路径（HDFS）配置目录（可选值）。指定此值允许程序使用短URI引用HDFS文件（hdfs:///path/to/files，不包括地址和端口 NameNode在文件URI中）。如果没有此选项，HDFS文件即可访问，但需要完全限定的URI hdfs://address:port/path/to/files。此选项也会导致文件编写器获取HDFS的块大小和默认值复制因素。 Flink将寻找“core-site.xml”和 “hdfs-site.xml”文件在指定目录中。

flink-conf.yaml中fs.hdfs.hadoopconf的用途

1 个答案: