尝试使用数据工厂运行配置单元活动,管道完成得很好,并且在集群内部创建了数据表,但是输出数据集不是在Azure数据湖存储中创建文件,这是故意的吗?
只是想学会如此温柔。
输入数据集:
包含数据的标准输入csv文件
{
"name": "dlsinput",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "dls",
"typeProperties": {
"fileName": "output.csv",
"folderPath": "data/output/",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": true,
"policy": {}
}
}
管道
指向hdinsight群集的管道
{
"name": "HiveActivitySamplePipeline",
"properties": {
"activities": [
{
"type": "HDInsightHive",
"typeProperties": {
"scriptPath": "scripts/hive.hql",
"scriptLinkedService": "sta"
},
"inputs": [
{
"name": "dlsinput"
}
],
"outputs": [
{
"name": "dlsoutput"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "HiveActivitySample",
"linkedServiceName": "hdi"
}
],
"start": "2018-04-05T12:20:00Z",
"end": "2018-04-10T23:59:59Z",
"isPaused": false,
"hubName": "adf",
"pipelineMode": "Scheduled"
}
}
输出:
使用我想要创建的文件输出
{
"name": "dlsoutput",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "dls",
"typeProperties": {
"fileName": "myfile.csv",
"folderPath": "data/output/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Hive.hql
DROP TABLE IF EXISTS temp;
CREATE EXTERNAL TABLE IF NOT EXISTS temp (
Name STRING,
Road STRING,
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'adl://dls.azuredatalakestore.net/data/output/';
答案 0 :(得分:0)
看起来是因为Hive.hql只包含创建表的命令,并且表中没有插入数据,因此您没有看到生成的任何数据文件。