应用错误收集

时间：2016-01-16 03:28:39

标签： hadoop apache-spark hive hiveql

在我的火花工作中，我收到了这个错误：

org.apache.spark.shuffle.MetadataFetchFailedException：缺少shuffle 0的输出位置

因为hive会将sql转换为hiveonspark作业，所以我不知道如何在hive中设置它以使其hiveonspark作业从StorageLevel.MEMORY_ONLY更改为StorageLevel.MEMORY_AND_DISK？

谢谢你的帮助~~~~

答案 0 :(得分：1)

您可以使用CACHE/UNCACHE [LAZY] Table <table_name>来管理缓存。 More details

如果您使用的是DataFrame，则可以使用persist（...）指定StorageLevel。看看API here.。

除了设置存储级别，您还可以优化其他内容。 SparkSQL使用一种称为Columnar存储的不同缓存机制，这是一种更有效的缓存数据的方法（因为SparkSQL是模式识别的）。可以调整不同的配置属性集来管理缓存，如detail here (THis is latest version documentation. Refer to the documentation of version you are using).

中所述