PySpark翻转键/值

时间:2017-01-23 10:39:35

标签: apache-spark pyspark

我试图从数据集中翻转键值来进行排序。但是,map函数返回无效的语法错误

rdd = clean_headers_rdd.rdd\
        .filter(lambda x: x['date'].year == 2016)\
        .map(lambda x: (x['user_id'], 1)).reduceByKey(lambda x, y: x + y)\
        .map(lambda (x, y): (y, x)).sortByKey(ascending = False)

enter image description here

1 个答案:

答案 0 :(得分:1)

PEP 3113 -- Removal of Tuple Parameter Unpacking

  • the transition plan推荐的方法:

    rdd.map(lambda x_y: (x_y[1],  x_y[0])
    
  • operator模块的快捷方式:

    from operator import itemgetter
    
    rdd.map(itemgetter(1, 0))
    
  • 切片:

    rdd.map(lambda x: x[::-1])