我使用以下命令加载了包含大约6000条数据行的文件
A = load '/home/hduser/hdfsdrive/piginput/data/airlines.dat' using PigStorage(',') as (Airline_ID:int, Name:chararray, Alias:chararray, IATA:chararray, ICAO:chararray, Callsign:chararray, Country:chararray, Active:chararray);
B = foreach airline generate Country,Airline_ID;
C = group B by Country;
D = foreach C generate group,COUNT(B);
在上面的代码中,我可以执行前3个命令而没有任何问题,但第4个命令运行了很长时间。我尝试了以下
dump C;
即使这个卡在同一个地方。这是日志:
2016-04-20 16:08:16,617 INFO org.apache.hadoop.util.NativeCodeLoader: 加载了native-hadoop库2016-04-20 16:08:16,898警告 org.apache.hadoop.metrics2.impl.MetricsSystemImpl:源名称ugi 已经存在! 2016-04-20 16:08:17,125 INFO org.apache.hadoop.util.ProcessTree:setsid退出,退出代码为0 2016-04-20 16:08:17,129 INFO org.apache.hadoop.mapred.Task:使用 ResourceCalculatorPlugin: org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1da9647b 2016-04-20 16:08:17,190 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager:MemoryLimit = 130652568, MaxSingleShuffleLimit = 32663142 2016-04-20 16:08:17,195 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0线程已启动:线程为 合并磁盘文件2016-04-20 16:08:17,195 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0线程已启动:线程为 合并内存文件2016-04-20 16:08:17,195 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0线程等待:线程为 合并磁盘文件2016-04-20 16:08:17,196 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0需要另外1个地图输出 其中0已在进行中2016-04-20 16:08:17,196 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0线程已启动:线程为 polling Map Completion Events 2016-04-20 16:08:17,196 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定0输出(0慢主机 and0 dup hosts)2016-04-20 16:08:22,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定1输出(0慢主机 and0 dup hosts)2016-04-20 16:09:18,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0需要另外1个地图输出 其中1已在进行中2016-04-20 16:09:18,203 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定0输出(0慢主机 and0 dup hosts)2016-04-20 16:10:18,208 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0需要另外1个地图输出 其中1已在进行中2016-04-20 16:10:18,208 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定0输出(0慢主机 and0 dup hosts)2016-04-20 16:11:18,214 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0需要另外1个地图输出 其中1已在进行中2016-04-20 16:11:18,214 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定0输出(0慢主机 and0 dup hosts)2016-04-20 16:11:22,395警告 org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0复制失败: 来自ubuntu的attempt_201604201138_0003_m_000000_0 2016-04-20 16:11:22,396 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException:connect time out out at java.net.PlainSocketImpl.socketConnect(Native Method)at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) 在 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) 在 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) 在java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)at java.net.Socket.connect(Socket.java:579)at sun.net.NetworkClient.doConnect(NetworkClient.java:175)at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)at sun.net.www.http.HttpClient。(HttpClient.java:211)at sun.net.www.http.HttpClient.New(HttpClient.java:308)at sun.net.www.http.HttpClient.New(HttpClient.java:326)at at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:998) 在 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:934) 在 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:852) 在 org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getInputStream(ReduceTask.java:1636) 在 org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.setupSecureConnection(ReduceTask.java:1593) 在 org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getMapOutput(ReduceTask.java:1493) 在 org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.copyOutput(ReduceTask.java:1401) 在 org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.run(ReduceTask.java:1333) 2016-04-20 16:11:22,398 INFO org.apache.hadoop.mapred.ReduceTask:任务 attempt_201604201138_0003_r_000000_0:从#获取#1失败 尝试_201604201138_0003_m_000000_0 2016-04-20 16:11:22,398警告 org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0将主机ubuntu添加到惩罚 框,下次联系12秒2016-04-20 16:11:22,398 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0:从前一个获得1个地图输出 失败2016-04-20 16:11:37,399 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定1输出(0慢主机 and0 dup hosts)2016-04-20 16:12:19,403 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0需要另外1个地图输出 其中1已在进行中2016-04-20 16:12:19,403 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201604201138_0003_r_000000_0预定0输出(0慢主机 and0 dup hosts)
即使我停止了所有工作并尝试重新启动,但没有用。我的环境是Ubuntu / Hadoop 1.2.1 / Pig 0.15.0
请帮忙。
谢谢,Sathish
答案 0 :(得分:1)
我解决了这个问题。问题是/ etc / hosts中配置的IP地址不正确。我将此更新为分配给Ubuntu计算机的IP地址并重新启动Hadoop服务。我发现这与hadoop-hduser-jobtracker-ubuntu.log不匹配,它说:
ERROR in ./src/app/home/home.scss
Module parse failed: C:\src\app\home\home.scss Line 1: Unexpected token :
You may need an appropriate loader to handle this file type.
| $font-stack: Helvetica, sans-serif;
| $primary-color: #333;
|
@ ./src/app/home/home.component.ts 49:20-42
在hadoop-hduser-datanode-ubuntu.log中,它抛出以下错误:
STARTUP_MSG: host = ubuntu/10.1.0.249
基于这些错误,我可以用IP地址跟踪问题并将其修复到/ etc / hosts文件中,重新启动服务器。在此之后,所有Hadoop作业都在运行而没有任何问题,我可以运行加载数据并运行PIG脚本。
谢谢,Sathish。
答案 1 :(得分:0)
您是将数据加载到关系A但是从关系航空公司生成B?
B = foreach airline generate Country,Airline_ID;
应该是
B = foreach A generate Country,Airline_ID;
此外,如果您计算每个国家/地区的航空公司数量,则必须将关系D修改为
D = foreach C generate group as Country,COUNT(B.Airline_ID);