我刚接触spring-hadoop并且想问一个普遍的问题。我有不同格式的文件,想用Apache Tika提取有用的内容,并将其作为文本文件存储在HDFS中。我已经浏览了spring data-hadoop(http://docs.spring.io/spring-hadoop/docs/2.0.0.RELEASE/reference/html/store.html)的参考文档,但不明白该怎么做。我没有找到任何其他有用的资源。
是否有使用spring data-hadoop将数据写入HDFS的示例项目或来源?
答案 0 :(得分:0)
来自Risberg的评论一个有用的例子: -
https://github.com/trisberg/springone-2015/tree/master/boot-ingest
另一个带有TextFileWriter实现DataWriter接口的代码片段: -
//build naming strategy
ChainedFileNamingStrategy namingStrategy =
new ChainedFileNamingStrategy(
Arrays.asList(new FileNamingStrategy[] {
new StaticFileNamingStrategy("document"),
new UuidFileNamingStrategy(someUUID),
new StaticFileNamingStrategy("txt", ".") }));
//set the naming strategy
textFileWriter.setFileNamingStrategy(namingStrategy);
textFileWriter.write("this is a test content");
//flush and close the writer
textFileWriter.flush();
textFileWriter.close();