Hadoop - Execute script when data arrives in hdfs

时间:2015-06-25 19:03:28

标签: hadoop apache-spark execute

Is there a tool in the Hadoop ecosystem which can actually know if new data has been added to the HDFS File System ? Actually I want to execute remotely a sqoop import job from an external database (no merge, only new table). Then when this data is written in HDFS, it would execute a spark script that would process with the newly data added and do some stuffs. Is there any feature in Hadoop that does this kind of job ? I could totally execute the spark script after the sqoop import job is done, but I would like to know if such feature exists and haven't find any yet. Thanks in advance.

1 个答案:

答案 0 :(得分:0)

是。有。在Hadoop生态系统中有一个名为Oozie的工作流工具来处理这种情况。

Oozie提供的工作流程可以根据固定的时间表或数据可用性触发运行。在您的情况下,它将被视为数据可用性。在Oozie doc上查看更多详细信息:Oozie doc for coordinator job