将pyspark映射写入txt文件时出错

时间:2016-04-20 14:48:00

标签: python numpy apache-spark pyspark

我正在对两个文件的内容进行块乘法。最后我尝试将结果写入文本文件,我收到以下错误,

Py4JJavaError: An error occurred while calling o426.saveAsTextFile

ValueError: could not convert string to float: 

该计划:

    import numpy as np
    from pyspark import SparkContext, SparkConf
    sc = SparkContext("local", "Simple App")
    mat = sc.textFile("mat1.txt")
    mat2 = sc.textFile("mat2.txt")

    matFilter = mat.map(lambda x: [float(i) for i in x.split(" ")])
    matFilter2 = mat2.map(lambda x: [float(i) for i in x.split(" ")])

    matgroupp = matFilter.map(lambda x: (x[0], [x[2]])).reduceByKey(lambda p,q: p+q)
    matgroup2 = matFilter2.map(lambda x: (x[1], [x[2]])).reduceByKey(lambda p,q: p+q)

    matInter = matgroupp.cartesian(matgroup2)

    matmul = matInter.map(lambda x: ((x[0][0], x[1][0]), np.dot(x[0][1], x[1][1]))).sortByKey(True)
    matmul.saveAsTextFile("results/res.txt")

mat1.txt的内容

0 0 10.0
1 0 10.0

mat2.txt的内容

0 0 20.0
0 1 10.0

0 个答案:

没有答案