对Spark数据框/ Hive结果集进行排序

时间:2016-11-08 11:50:07

标签: scala apache-spark hive

我试图从Hive表中检索列列表并将结果存储在spark数据帧中。

var my_column_list = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table""")

但是我无法按字母顺序对数据框进行排序,甚至无法对显示列查询的结果进行排序。我尝试使用sort和orderBy()。

我如何按字母顺序对结果进行排序?

更新:添加了我的代码示例

import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.hive.HiveContext

val hiveContext = new HiveContext(sc)
hiveContext.sql("USE my_test_db")

var lv_column_list = hiveContext.sql(s""" SHOW COLUMNS IN MYTABLE""")
//WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems

lv_column_list.show //Works fine
lv_column_list.orderBy("result").show //Error arises

3 个答案:

答案 0 :(得分:3)

SHOW COLUMNS查询生成一个包含名为result的列的Dataframe。如果您按此栏目订购,则可获得所需内容:

val df = hiveContext.sql(s""" SHOW COLUMNS IN $my_hive_table """)
df.orderBy("result").show

答案 1 :(得分:0)

不确定您是如何使用sort或orderBy方法的,

尝试以下方法,

/**
 * Sort a List.js list by 4 level of sort
 *
 * @param  {Object} list        [List.js instance]
 * @param  {String} first       [First field to sort]
 * @param  {String} second      [Second field to sort]
 * @param  {String} third       [Third field to sort]
 * @param  {String} fourth      [Fourth field to sort]
 * @param  {String} firstOrder  [Order type: asc || desc]
 * @param  {String} secondOrder [Order type: asc || desc]
 * @param  {String} thirdOrder  [Order type: asc || desc]
 * @param  {String} fourthOrder [Order type: asc || desc]
 * @return {}              []
 */
var sortList = function(list,
    first, second, third, fourth,
    firstOrder, secondOrder, thirdOrder, fourthOrder,
) {
    console.log("ListManager sort:", list, first, second, third, fourth);
    // If list not defined
    if (Utility.empty(list)) {
        console.error("ListManager: can't sort, list is undefined!");
        // Error no list!
        return false;
    }

    // If first order id not defined
    if (Utility.empty(first)) {
        // Set default
        first = "name";
    }

    // If second order id not defined
    if (Utility.empty(second)) {
        // Set default
        second = "born";
    }

    // If third order id not defined
    if (Utility.empty(third)) {
        // Set default
        third = "color";
    }

    // If fourth order id not defined
    if (Utility.empty(fourth)) {
        // Set default
        fourth = "lastName";
    }

    // Check order if asc or desc
    if (Utility.empty(firstOrder)) {
        // Set order
        firstOrder = "desc";
    }

    // Check order if asc or desc
    if (Utility.empty(secondOrder)) {
        // Set order
        secondOrder = "desc";
    }

    // Check order if asc or desc
    if (Utility.empty(thirdOrder)) {
        // Set order
        thirdOrder = "desc";
    }

    // Check order if asc or desc
    if (Utility.empty(fourthOrder)) {
        // Set order
        fourthOrder = "desc";
    }

    // Make list compatible
    first = "order_" + first;
    second = "order_" + second;
    third = "order_" + third;
    fourth = "order_" + fourth;
    // Make ascending descending compatible
    firstAsc = firstOrder === "asc" ? true : false;
    secondAsc = secondOrder === "asc" ? true : false;
    thirdAsc = thirdOrder === "asc" ? true : false;
    fourthAsc = fourthOrder === "asc" ? true : false;

    console.log("List Sort: ", first, second, third, fourth);
    console.log("List Asc: ", firstAsc, secondAsc, thirdAsc, fourthAsc);

    // Call sort method of List.js
    list.sort('', {order: '',
        sortFunction:
            function(a, b) {
                // Compare values with field requested
                return _compareIntegerStringDate(a, b, first, firstAsc)
                    || _compareIntegerStringDate(a, b, second, secondAsc)
                    || _compareIntegerStringDate(a, b, third, thirdAsc)
                    || _compareIntegerStringDate(a, b, fourth, fourthAsc);
            }
        }
    );
}

/**
 * Compare list.js items value based on filed request.
 * Get correct comparison between integers, strings, or dates.
 *
 * @param  {Object}  a           [List.js item instance]
 * @param  {Object}  b           [List.js item instance]
 * @param  {String}  field       [Field to compare]
 * @param  {Boolean} isAscending [Determinate if is ascending order]
 * @return {Integer}             [-X || 0 || +X]
 */
var _compareIntegerStringDate = function(a, b, field, isAscending) {
    console.log(field + " isAscending " + isAscending);
    if (Utility.isInt(a.values()[field])) {
        // Compare integer
        return isAscending
            ? a.values()[field] - b.values()[field]
            : b.values()[field] - a.values()[field];
    }
    else if(Utility.isDate(a.values()[field], "YYYY-MM-DD HH:mm:ss")){
        // Compare Date
        return isAscending
            ? Date.parse(a.values()[field]) - Date.parse(b.values()[field])
            : Date.parse(b.values()[field]) - Date.parse(a.values()[field]);
    }
    else {
        // Compare strings
        return isAscending
            ? b.values()[field].localeCompare(a.values()[field])
            : a.values()[field].localeCompare(b.values()[field]);
    }
}

答案 2 :(得分:0)

我使用'DESC'代替'SHOW COLUMNS',并使用“col_name”检索列列表。

addSubview