我想知道是否存在从HDFS到Storm的流数据的任何spout实现(类似于HDFS的Spark Streaming)。我知道有一些螺栓实现可以将数据写入HDFS(https://github.com/ptgoetz/storm-hdfs和http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_user-guide/content/ch_storm-using-hdfs-connector.html),但另一方面我无法找到。 我感谢任何建议和提示。
答案 0 :(得分:3)
选项是使用Hadoop HDFS java API。假设您正在使用maven,那么您将在pom.xml中包含hadoop-common:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0.2.2.0.0-2041</version>
</dependency>
然后,在您的spout实现中,您将使用HDFS FileSystem对象。例如,下面是一些伪代码,用于将文件中的每一行作为字符串发出:
@Override
public void nextTuple() {
try {
Path pt=new Path("hdfs://servername:8020/user/hdfs/file.txt");
FileSystem fs = FileSystem.get(new Configuration());
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(pt)));
String line = br.readLine();
while (line != null){
System.out.println(line);
line=br.readLine();
// emit the line which was read from the HDFS file
// _collector is a private member variable of type SpoutOutputCollector set in the open method;
_collector.emit(new Values(line));
}
} catch (Exception e) {
_collector.reportError(e);
LOG.error("HDFS spout error {}", e);
}
}