新手到Flink。
我可以在远程hdfs集群中存在的文件上运行示例wordcount.jar,而无需在flink conf中声明fs.hdfs.hadoopconf变量。
所以想知道上面提到的变量究竟是什么目的 声明它会改变运行示例jar的方式吗?
命令:
flink-cluster.vm ~]$ /opt/flink/bin/flink run /opt/flink/examples/batch/WordCount.jar --input hdfs://hadoop-master:9000/tmp/test-events
输出:
.......
07/13/2016 00:50:13 Job execution switched to status FINISHED.
(foo,1)
.....
(bar,1)
(one,1)
设置:
由于
更新:
正如Serhiy所指出的,在conf中声明了fs.hdfs.hadoopconf但是在使用更新的参数hdfs:///tmp/test-events.1468374669125
运行作业时出现了以下错误
弗林克-conf.yaml
# You can also directly specify the paths to hdfs-default.xml and hdfs-site.xml
# via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
#
fs.hdfs.hadoopconf: hdfs://hadoop-master:9000/
fs.hdfs.hdfsdefault : hdfs://hadoop-master:9000/
命令:
flink-cluster.vm ~]$ /opt/flink/bin/flink run /opt/flink/examples/batch/WordCount.jar --input hdfs:///tmp/test-events
输出:
Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: The given HDFS file URI (hdfs:///tmp/test-events.1468374669125) did not describe the HDFS NameNode. The attempt to use a default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' config parameter failed due to the following problem: Either no default file system was registered, or the provided configuration contains no valid authority component (fs.default.name or fs.defaultFS) describing the (hdfs namenode) host and port.
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:172)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:679)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1026)
... 19 more
答案 0 :(得分:1)
fs.hdfs.hadoopconf
:Hadoop文件系统的绝对路径 (HDFS)配置目录(可选值)。指定此值 允许程序使用短URI引用HDFS文件 (hdfs:///path/to/files
,不包括地址和端口 NameNode在文件URI中)。如果没有此选项,HDFS文件即可 访问,但需要完全限定的URIhdfs://address:port/path/to/files
。此选项也会导致文件 编写器获取HDFS的块大小和默认值 复制因素。 Flink将寻找“core-site.xml”和 “hdfs-site.xml”文件在指定目录中。