连接到集群外的Cloudera Impala / Hive

时间:2016-04-14 06:30:08

标签: java hive cloudera impala

我正在使用cloudera impala服务器版本5.4.7 首先要确保端口是打开的,我已经用telnet验证它。

        Class.forName("org.apache.hive.jdbc.HiveDriver");
        DriverManager.setLoginTimeout(30);
try (java.sql.Connection connection = DriverManager.getConnection("jdbc:hive2://12.23.56.789:123456/someName;auth=noSasl"))
{    System.out.println("connected");      }

但我从未成功连接

所有我得到的是超时的错误:

可能是什么问题? 我正在使用与cloudera版本完全相同的hive版本

  [14 Apr 2016 06:27:26,797] [ERROR] [main] [org.apache.hive.jdbc.HiveConnection] - Error opening session
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:475)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at com.datorama.core.service.delivery.providers.DatabaseProvider.main(DatabaseProvider.java:330)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 13 more
Exception in thread "main" java.sql.SQLException: Could not establish connection to jdbc:hive2://54.69.2.250:21050/sage_global;auth=noSasl: java.net.SocketTimeoutException: Read timed out
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:486)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at com.datorama.core.service.delivery.providers.DatabaseProvider.main(DatabaseProvider.java:330)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:475)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 13 more

1 个答案:

答案 0 :(得分:1)

我们使用JDBC从群集外部进行大量查询。虽然我相信可能会使用Hive JDBC驱动程序,但您肯定需要在JDBC连接字符串中设置适当的端口,对于Impala来说可能是21050。您需要确保您的主机名(或IP地址)指向运行Impala守护程序的实例(对于Hive,您可能指向namenode)。我的猜测是端口号是错误的,因为似乎错误只是无法建立顶层连接。

我们决定使用Cloudera为Impala提供的特定驱动程序,尽管这可能没有必要。我们还设置了一个负载均衡器,因此有一个稳定的地址可以直接查询,而不是要求调用者选择特定的Impala实例。这也可以均匀地分散负载,让我们在集群中进行更改,而无需外部调用者进行任何更改。