通过java连接到远程Hadoop集群(CDH4)

时间:2013-07-19 09:35:25

标签: hadoop bigdata apache-pig cloudera

我有一个远程Hadoop集群的机器(Cloudera CDH4)。我试图从我的电脑上运行一个猪脚本。这是我的JAVA代码:

import org.apache.pig.ExecType;
import org.apache.pig.PigServer;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.Tuple;

public class TestPig {

public static void main(String args[]){

    PigServer pigServer;
    try {

        /** On définit les propriétés */
        Properties props = new Properties();

        props.setProperty("fs.default.name", "hdfs://master.node.ip.adress:8020");
        props.setProperty("mapred.job.tracker", "master.node.ip.adress:8021");





        System.setProperty("javax.xml.parsers.DocumentBuilderFactory", "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");

        /** mapreduce mode */
        pigServer = new PigServer(ExecType.MAPREDUCE, props);

        /** pig script's path */
        pigServer.registerScript("/user/admin/data/script.pig");

        /** printing the pig script's output */
        Iterator<Tuple> results = pigServer.openIterator("A");
        while(results.hasNext())
            System.out.println(results.next().toDelimitedString("\t"));


    } 
    catch (ExecException e) {   e.printStackTrace(); } 
    catch (IOException e) { e.printStackTrace(); }

}
}

当我启动此程序时,出现以下错误:

  

13/07/19 10:44:02 INFO executionengine.HExecutionEngine:连接到hadoop文件系统:hdfs://master.node.ip.adress:8020   13/07/19 10:44:23 INFO ipc.Client:重试连接服务器:master.cs236cloud.internal / master.node.ip.adress:8020。已经尝试了0次; maxRetries = 45

     

...

     

13/07/19 10:59:27 INFO ipc.Client:重试连接到服务器:master.cs236cloud.internal / master.node.ip.adress:8020。已经尝试了43次; maxRetries = 45   13/07/19 10:59:48 INFO ipc.Client:重试连接到服务器:master.cs236cloud.internal / master.node.ip.adress:8020。已经尝试了44次; maxRetries = 45

     

线程中的异常&#34; main&#34; java.lang.RuntimeException:无法创建DataStorage       在org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)       在org.apache.pig.backend.hadoop.datastorage.HDataStorage。(HDataStorage.java:58)       在org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:204)       在org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:117)       在org.apache.pig.impl.PigContext.connect(PigContext.java:240)       在org.apache.pig.PigServer。(PigServer.java:213)       在org.apache.pig.PigServer。(PigServer.java:198)       在org.apache.pig.PigServer。(PigServer.java:194)       在TestPig.main(TestPig.java:42)   引起:java.net.SocketTimeoutException:调用master.cs236cloud.internal / master.node.ip.adress:8020在套接字超时异常时失败:java.net.SocketTimeoutException:等待通道准备连接时20000毫秒超时。 ch:java.nio.channels.SocketChannel [connection-pending remote = master.cs236cloud.internal / master.node.ip.adress:8020]       在org.apache.hadoop.ipc.Client.wrapException(Client.java:1140)       在org.apache.hadoop.ipc.Client.call(Client.java:1112)       在org.apache.hadoop.ipc.RPC $ Invoker.invoke(RPC.java:229)       在$ Proxy1.getProtocolVersion(未知来源)       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)       at java.lang.reflect.Method.invoke(Unknown Source)       在org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)       在org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)       在$ Proxy1.getProtocolVersion(未知来源)       在org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)       在org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)       在org.apache.hadoop.hdfs.DFSClient。(DFSClient.java:281)       在org.apache.hadoop.hdfs.DFSClient。(DFSClient.java:245)       在org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)       在org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1437)       在org.apache.hadoop.fs.FileSystem.access $ 200(FileSystem.java:66)       在org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java:1455)       在org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)       在org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)       在org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)       ......还有8个   引起:java.net.SocketTimeoutException:等待通道准备连接时20000毫秒超时。 ch:java.nio.channels.SocketChannel [connection-pending remote = master.cs236cloud.internal / master.node.ip.adress:8020]       在org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)       在org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)       在org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)       在org.apache.hadoop.ipc.Client $ Connection.setupConnection(Client.java:453)       在org.apache.hadoop.ipc.Client $ Connection.setupIOstreams(Client.java:579)       在org.apache.hadoop.ipc.Client $ Connection.access $ 2100(Client.java:202)       在org.apache.hadoop.ipc.Client.getConnection(Client.java:1243)       在org.apache.hadoop.ipc.Client.call(Client.java:1087)       ......还有28个

我确定主人正在收听8020端口。的确,当我启动这个命令时:

  

netstat -an | grep 8020

我得到了这个结果:

  

tcp 0 0 master.local.ip.adress:8020 0.0.0.0:* LISTEN

我还想给你我的core-site.xml文件:

<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2013-07-12T10:43:15.666Z-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://Master.cs236cloud.internal:8020</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
       <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>simple</value>
  </property>
  <property>
    <name>hadoop.rpc.protection</name>
    <value>authentication</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>DEFAULT</value>
  </property>
</configuration>

以下是我的主机上jps命令的结果:

3250 DataNode
4468 Main
2776 HeadlampServer
2541 RunJar
5496 Jps
4467 EventCatcherService
2502 QuorumPeerMain
2650 JobTracker
3082 RunJar
2597 HRegionServer
2594 TaskTracker
2629 HMaster
2520
3003 SecondaryNameNode
4553 Main
3414 Bootstrap
2549 AlertPublisher
3172 NameNode
2127 Main
4583 Main
3350 Bootstrap

我在互联网上搜索了解决方案,但我没有找到任何可行的方法。

你有关于如何解决这个问题的想法吗?

感谢。

0 个答案:

没有答案