微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

java.lang.OutOfMemoryError:执行 Hive 查询时 Java 堆空间错误

如何解决java.lang.OutOfMemoryError:执行 Hive 查询时 Java 堆空间错误

在使用 TEZ 执行引擎从 Hive Shell 运行 Hive 查询时,我收到 java.lang.OutOfMemoryError:日志中的 Java 堆空间错误,但查询最终已完成。

我想了解为什么我会在日志中收到此错误,此查询过去可以正常工作,没有任何问题。

有没有人有任何线索或文件可以帮助我理解这个问题。我试过谷歌它,但它没有多大帮助。

在此先感谢您的帮助!!!

ERROR : Status: Failed
ERROR : Vertex Failed,vertexName=Map 3,vertexId=vertex_1622153507491_0145_1_02,diagnostics=[Task Failed,taskId=task_1622153507491_0145_1_02_000006,diagnostics=[TaskAttempt 0 Failed,info=[Error: Error while running task ( failure ) : java.lang.RuntimeException: Map operator initialization Failed
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:361)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at org.apache.tez.runtime.LogicalIOProcessorRuntiMetask.run(LogicalIOProcessorRuntiMetask.java:374)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupinformation.doAs(UserGroupinformation.java:1730)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.Metadata.HiveException: Async Initialization Failed. abortRequested=false
        at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:465)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:399)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
        at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:342)
        ... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
        at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMapStore.addMore(VectorMapJoinFastBytesHashMapStore.java:539)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.add(VectorMapJoinFastBytesHashMap.java:101)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringCommon.adaptPutRow(VectorMapJoinFastStringCommon.java:59)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.putRow(VectorMapJoinFastStringHashMap.java:37)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.putRow(VectorMapJoinFastTableContainer.java:183)
        at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:130)
        at org.apache.hadoop.hive.ql.exec.MapJoinoperator.loadHashTableInternal(MapJoinoperator.java:344)
        at org.apache.hadoop.hive.ql.exec.MapJoinoperator.loadHashTable(MapJoinoperator.java:413)
        at org.apache.hadoop.hive.ql.exec.MapJoinoperator.lambda$initializeOp$0(MapJoinoperator.java:215)
        at org.apache.hadoop.hive.ql.exec.MapJoinoperator$$Lambda$27/55723736.call(UnkNown Source)
        at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)
        at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more

解决方法

加载哈希表时,在 MapJoin 运算符中出现 OOM 异常。也许没有mapjoin的替代路径成功了,这就是它最终完成的原因。

你可以做什么:尝试增加映射器并行度,如果你有更多的映射器而 id 没有帮助,增加映射器内存。 检查您当前的设置并进行相应更改。

  1. 增加映射器并行度(如果实际上是因为为 mapjoin 加载到内存中的表太大,这可能无济于事)。
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set tez.grouping.max-size=32000000;  --decreasing max-size increases parallelism
set tez.grouping.min-size=32000;     --if you have small files less than min-size,mapper will additionally process other files 
  1. 增加映射器容器大小(检查您当前的设置并相应增加)。这只是示例:
    set hive.tez.container.size=2048;  --container size in megabytes
    set hive.tez.java.opts=-Xmx1700m;  --set this 80% of hive.tez.container.size
  1. Map端聚合会导致OOM,尽量禁用
    set hive.map.aggr=false; 
  1. 检查你的mapjoin设置,可能smalltable size设置的太大了,对比你之前设置的container size:Hive Map-Join configuration mystery

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。