如何从HDFS运行pig脚本?

时间:2017-08-27 19:38:04

标签: apache hadoop apache-pig

我正在尝试从hdfs运行pig脚本,但它显示错误,因为该文件不存在。

我的hdfs目录

[cloudera@quickstart ~]$ hdfs dfs -ls /
Found 11 items
drwxrwxrwx   - hdfs     supergroup          0 2016-08-10 14:35 /benchmarks
drwxr-xr-x   - hbase    supergroup          0 2017-08-19 23:51 /hbase
drwxr-xr-x   - cloudera supergroup          0 2017-07-13 04:53 /home
drwxr-xr-x   - cloudera supergroup          0 2017-08-27 07:26 /input
drwxr-xr-x   - cloudera supergroup          0 2017-07-30 14:30 /output
drwxr-xr-x   - solr     solr                0 2016-08-10 14:37 /solr
-rw-r--r--   1 cloudera supergroup        273 2017-08-27 11:59 /success.pig
-rw-r--r--   1 cloudera supergroup        273 2017-08-27 12:04 /success.script
drwxrwxrwt   - hdfs     supergroup          0 2017-08-27 12:07 /tmp
drwxr-xr-x   - hdfs     supergroup          0 2016-09-28 09:00 /user
drwxr-xr-x   - hdfs     supergroup          0 2016-08-10 14:37 /var 

执行命令

[cloudera@quickstart ~]$ pig -x mapreduce /success.pig 

错误消息

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2017-08-27 12:34:39,160 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.0 (rexported) compiled Jun 16 2016, 12:40:41
2017-08-27 12:34:39,162 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/cloudera/pig_1503862479069.log
2017-08-27 12:34:47,079 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /success.pig does not exist
Details at logfile: /home/cloudera/pig_1503862479069.log

我错过了什么?

1 个答案:

答案 0 :(得分:1)

您可以使用-f <script location>选项和选项值来运行位于HDFS路径的脚本。但是脚本位置需要是以下语法和示例中给出的绝对路径。

Syntax: 
pig -f <fs.defaultFS>/<script path in hdfs>

Example: 
pig -f hdfs://Foton/user/root/script.pig