将PairedRDD保存为文本文件

时间:2015-07-19 00:31:02

标签: python apache-spark pyspark

users_grpd = pairs.groupByKey()

users_grpd_flattened = meds_grpd.map(
    lambda keyValue: (keyValue[0], ' '.join(map(str, keyValue[1]))))

users_grpd_flattened.saveAsTextFile('pairedrddresults.txt')

输出:

(u'3300975212', '120818 120519 120850 120521')

(u'3200272220', '120036 105037')

(u'13101231222', '2024574 12024')

我想知道是否有办法将此pairedrdd保存为文本文件,其中省略了前导u和引号?

1 个答案:

答案 0 :(得分:1)

如果您需要特定格式,可以直接映射到字符串:

users_grpd_flattened = (pairs.groupByKey().
    map(lambda (k, vals): "{0}, {1}".format(k, ' '.join(str(x) for x in vals))))

如果需要括号,只需将格式字符串替换为"({0}, {1})"