`java.lang.OutOfMemoryError:无法在Hive

时间:2015-07-31 07:16:10

标签: java hadoop hive

我正在尝试使用Hive将8GB的CSV文件(包含大约1亿条记录)加载到HDFS中。 我通过一次读取10000条记录并加载它们来批量执行此操作。 它在HDFS中花费​​了大约10 GB。

此过程未完全成功。它在加载最后一个CSV文件时抛出java.lang.OutOfMemoryError

在尝试加载文件之后,我尝试了很多次加载新数据,但是我一直得到许多不同的异常,java.lang.OutOfMemoryError作为它的基础子例外。其中一些例外情况如下:

2015-07-30 18:57:11,887 ERROR [Thread-11]: server.TThreadPoolServer (TThreadPoolServer.java:serve(194)) - ExecutorService threw error: java.lang.OutOfMemoryError: unable to create new native thread
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
                   :
                   :
        at java.lang.Thread.run(Thread.java:722)

2015-07-30 18:57:11,888 FATAL [Thread-11]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(101)) - Error starting HiveServer2: could not start ThriftBinaryCLIService
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
                   :
                   :
        at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:99)
        at java.lang.Thread.run(Thread.java:722)

2015-07-30 18:57:11,891 WARN  [HiveServer2-Handler-Pool: Thread-3652]: thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error fetching results:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.OutOfMemoryError): unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
                   :
                   :       
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.lang.OutOfMemoryError): unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
                   :
                   :
        at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:50)
        at org.apache.hadoop.security.Groups.getGroups(Groups.java:139)

我用Google搜索并找到了this个帖子。

我在ulimit -n 8192 -u 8192中添加了.bashrc并重新启动了hiveserver和Metastore。

现在我再次尝试加载数据。这次它给了我新的错误,但有相同的基础java.lang.OutOfMemoryError

2015-07-31 11:46:31,133 ERROR [HiveServer2-Handler-Pool: Thread-6368]: ql.Driver (SessionState.java:printError(960)) - FAILED: SemanticException Error creating temporary folder on: hdfs://mycluster/tmp/hive/anonymous/a3a45571-f359-4747-aaac-fad9e8f9c6ae/hive_2015-07-31_11-46-31_007_6758184846137099807-272/-mr-10000
org.apache.hadoop.hive.ql.parse.SemanticException: Error creating temporary folder on: hdfs://mycluster/tmp/hive/anonymous/a3a45571-f359-4747-aaac-fad9e8f9c6ae/hive_2015-07-31_11-46-31_007_6758184846137099807-272/-mr-10000
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6467)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9020)
                   :
                   :
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException: Cannot create staging directory 'hdfs://mycluster/tmp/hive/anonymous/a3a45571-f359-4747-aaac-fad9e8f9c6ae/hive_2015-07-31_11-46-31_007_6758184846137099807-272/-mr-10000/.hive-staging_hive_2015-07-31_11-46-31_007_6758184846137099807-272': unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:521)
                   :
                   :
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6465)
        ... 40 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.OutOfMemoryError): unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)

它说Error creating temporary folder。我们通常将CSV文件从本地机器复制到HDFS中当前用户的临时文件夹,然后将这些文件加载​​到Hive表中。但是,我通过Web浏览器检查,即使在当前用户的临时文件夹中也没有在HDFS中创建任何文件。没有记录在Hive表中填充。

所以我猜测可能有什么问题。 ulimit命令设置的打开文件和进程限制的数量确实有帮助吗?我们只重新启动了hiveserver和Metastore。我们必须重启整个集群吗?

或者由于在datanodes(HDFS)中创建的目录数量达到了极限(因为我们已经加载了1亿条记录)?

我们总共有四个数据节点,内存为4GB。集群总共有30 GB的免费HDFS空间。

0 个答案:

没有答案