Question

我有一个猪脚本，需要从本地hadoop集群加载文件。我可以使用hadoop命令列出文件：hadoop fs -ls / repo / mydata，` 但是当我试图在猪脚本中加载文件时，它失败了。 load语句是这样的：

in = LOAD '/repo/mydata/2012/02' USING PigStorage() AS (event:chararray, user:chararray)

错误信息是：

Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/repo/mydata/2012/02

任何想法？感谢

Answer 1

我的建议：

在hdfs中创建一个文件夹：hadoop fs -mkdir /pigdata
将文件加载到创建的hdfs文件夹：hadoop fs -put /opt/pig/tutorial/data/excite-small.log /pigdata

（或者你可以从grunt shell中做到grunt> copyFromLocal /opt/pig/tutorial/data/excite-small.log /pigdata）

执行猪拉丁文：

   grunt> set debug on

   grunt> set job.name 'first-p2-job'

   grunt> log = LOAD 'hdfs://hostname:54310/pigdata/excite-small.log' AS 
              (user:chararray, time:long, query:chararray); 
   grunt> grpd = GROUP log BY user; 
   grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); 
   grunt> STORE cntd INTO 'output';

输出文件将存储在hdfs://hostname:54310/pigdata/output

Answer 2

我遇到了同样的问题..请在下面找到我的建议：

要开始处理PIG，请输入： [root @ localhost training] #pig -x local
现在按以下示例键入load语句：的咕噜＆GT; a = LOAD'/home/training/pig/TempFile.txt'使用PigStorage（'，'）为（c1：chararray，c2：chararray，c3：chararray）;

Answer 3

摆脱＆＃34; =＆＃34;两侧的空间。 in = LOAD＆＃39; / repo / mydata / 2012/02＆＃39;使用PigStorage（）AS（事件：chararray，用户：chararray）

如何使用apache pig在hadoop集群上加载文件？

3 个答案: