Question

我正在尝试在Hive中加入两个ORC表，但是我收到了一个错误。这是查询：

indexOf

错误：

 //if country exist
else if (cnt_exist == 1) {
    alert("country exist");

    var len = $scope.placeCollection[cnt_i].locFrnd.length;
    for (var j = 0; j < len; j++) {
        var dup = [];
        dup[j] = $scope.placeCollection[cnt_i].locFrnd[j].name;
    }

    //check for friend now
    alert("checking for friend");

    //some code has to  inserted here to handle Friends as it is an array  
    alert($scope.Friend.length);

    for (var k = 0; k < $scope.Friend.length; k++) {
        var frnd_exist = 0;

        alert($scope.Friend[k]);
        alert(dup.indexOf($scope.Friend[k]));

        if (dup.indexOf($scope.Friend[k]) != -1) // friend exist
        {
            alert("entered friend comparison");
            frnd_exist = 1;
        }

        if (frnd_exist == 1) // if friend does not exist
        {
            alert("friend exist");
        } else if (frnd_exist == 0) {
            var eachFriend = {
                name: $scope.Friend[k]
            }

            $scope.placeCollection[cnt_i].locFrnd.push(eachFriend);
        }
    }

我试图设置地图内存并将内存减少到22000但仍然没有运气。在搜索互联网后，我发现有人建议在hive中设置select t1.num as num, t1.product as Product, t2.value as OldValue, t1.value as NewValue from test_new t1 LEFT OUTER JOIN test_old t2 ON t1.num=t2.num and t1.product=t2.product where t2.value is NULL and t1.value is not NULL or t1.value<>t2.value;属性以克服上述错误并且我的查询开始运行。

我不确定以这种方式运行我的查询会获得任何性能。性能是否仍然相同？我们还有其他方法可以解决问题吗？请提出一些有关改进查询性能的建议。

Answer 1

您的第一个也是最安全的选择是设置hive.auto.convert.join = false。这样您就会牺牲一些性能，因为您不会从mapjoin中受益。但这完全取决于您的使用案例和您的数据大小，这种妥协有多大。另一个选择是使用hive.auto.convert.join.noconditionaltask.size选项，根据https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization“允许用户控制大小表可以适合内存”找到正确的阈值可能是一个挑战。

P.S。请记住hive.auto.convert.join.noconditionaltask.size生效，hive.auto.convert.join.noconditionaltask需要为true（默认情况下是这样）。

Hive运行时错误：映射本地工作耗尽的内存

1 个答案: