Any activity that we create within a pipeline has to have an output dataset -- which I believe is purely syntactical rule in case if the activity is of 'HDInsightHive' type. Because anyway the actual output destination will be determined by the HQL query itself. For example in our case the HQL query is selecting rows from certain table and inserting rows to some other external table. So ultimately it is that HQL which is determining where will the output go. The name of the destination table is appearing in the HQL itself (INSERT OVERWRITE tablename..). So in that case the output dataset defined within the activity only seems to acting as a syntactic glue that has to be there for the sake of it. Is that correct?
答案 0 :(得分:2)
It is true you can define where the data will land in a HQL query just like in a USQL query as well. The main function of the output data set as I see it, would be that it allows you to pipe the output into another activity. If you did not define an output data set or you defined it with a folder not matching where the HQL script puts the output, you would not be able to use that data set as input to another activity. If ALL of your pipelines always end with your HQL activity and you never need to do anything after that point then I can see how it would seem like there is no need for an output data set.
HTH