使用jdbc读取巨大的Hive表会导致java.lang.OutOfMemoryError:Java堆空间

时间:2018-12-14 08:36:51

标签: java jdbc groovy hive

我正在尝试使用groovy读取具有1300万条记录的巨大蜂巢表,其中数据为拼花格式。我使用以下代码编写代码,但出现OOM Java堆空间错误。 我给了最大32 GB的内存,setFetchsize(5000)仍然出现错误。

JAVA_OPTS="-Xms1024M"
JAVA_OPTS="-Xmx32556M"

任何帮助将不胜感激。

代码:

 String contSql = "select * from staging.cont_staging";
                ResultSet resRateRecords = stmt.executeQuery(contSql);
                Map <String,Map<String,String>> masterRecords = new HashMap<String,Map<String,String>>();
                Map<String,String> existingRecords = null;
                int count = 0;
                resRateRecords.setFetchSize(5000);
                while(resRateRecords.next()) {

                        try{existingRecords = new HashMap<String,String>();
 masterRecords.put(resRateRecords.getString("contract_id")+"#"+count++,existingRecords);
                        }catch(Exception e){
                                e.printStackTrace();
                        }

错误

java.lang.OutOfMemoryError: Java heap space
        at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:355)
        at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:347)
        at org.apache.hive.service.cli.thrift.TStringColumn$TStringColumnStandardScheme.read(TStringColumn.java:453)
        at org.apache.hive.service.cli.thrift.TStringColumn$TStringColumnStandardScheme.read(TStringColumn.java:433)
        at org.apache.hive.service.cli.thrift.TStringColumn.read(TStringColumn.java:367)
        at org.apache.hive.service.cli.thrift.TColumn.standardSchemeReadValue(TColumn.java:328)
        at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)
        at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
        at org.apache.thrift.TUnion.read(TUnion.java:138)
        at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.read(TRowSet.java:573)
        at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.read(TRowSet.java:525)
        at org.apache.hive.service.cli.thrift.TRowSet.read(TRowSet.java:451)
        at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.read(TFetchResultsResp.java:518)
        at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.read(TFetchResultsResp.java:486)
        at org.apache.hive.service.cli.thrift.TFetchResultsResp.read(TFetchResultsResp.java:408)
        at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.read(TCLIService.java:13251)
        at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.read(TCLIService.java:13236)
        at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.read(TCLIService.java:13183)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:505)
        at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:492)
        at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:335)
        at java_sql_ResultSet$next.call(Unknown Source)
        at BEContractRateLoad.fetchContractRateRecords(DestRateLoad.groovy:300)
        at BEContractRateLoad.processContractRecords(DestRateLoad.groovy:397)
        at BEContractRateLoad$processContractRecords$1.call(Unknown Source)
Groovy has reported an error, terminating

0 个答案:

没有答案