有没有办法让猪脚本的结果直接在远程集群上运行而不需要存储它们并单独检索它们?
答案 0 :(得分:1)
因此,您可以使用pig参数来运行脚本。例如:
example.pig
A = LOAD '$PATH_TO_FOLDER_WITH_DATA' AS (f1:int, f2:int, f3:int);
--# Do Something With Your Data, and get output
C = STORE ouput INTO '$OUTPUT_PATH'
然后你可以运行脚本:
pig -p "/path/to/local/file" -p "/path/to/the/output" example.pig
所以要在BASH自动化:
storelocal.sh
#!/bin/bash
pig -p '$PATH_TO_FILES' -p '$PATH_TO_HDFS_OUT' example.pig
hdfs dfs -getmerge '$PATH_TO_HDFS_OUT' '$PATH_TO_LOCAL'
你可以运行它./storelocal.sh /path/to/local/file /path/to/the/local/output