时间:2018-07-10 15:45:31

标签: python pyspark apache-spark-sql

我正在尝试将数据框转换为json字符串。我正在使用pyspark。

这是我正在使用的代码。

def produceTrainData (self,csvData): #Array[String] = {

    trainData = csvData.withColumn("therapyClass", lit("REMODULIN"))\
                .withColumn("patientAge", lit(52))\
                .withColumn("patientSex", lit("M"))\
                .withColumn("serviceType", lit("PHARMACY"))\
                .withColumn("npiId", lit("27"))\
                .withColumn("requestID", lit(419568891))\
                .withColumn("requestDateTime", lit("20171909 21:30:55"))\

    selectData = trainData.select("payorId", "patientId","therapyType","therapyClass","ndcNumber","procedureCode","patientAge","patientSex",
                        "placeOfService", "serviceDuration","daysOrUnits","charges", "serviceDate", "serviceType","serviceBranchId",
                        "npiId", "diagnosisCode", "authNbr","requestID", "requestDateTime")

    authNbrFilter = col("authNbr") != "-"        
    filterData = selectData.where(authNbrFilter)#.limit(20)
    print(filterData)

    filterData.show(20,False)

    jsons = filterData.toJSON        

    print(jsons)

有两个错误:

  1. 当我打印jsons变量(print(jsons))时,它没有像预期的那样返回rdd,而是返回了: DataFrame的绑定方法DataFrame.toJSON

我很高兴知道错误的原因。

  1. 当我尝试收集jsons变量时显示下一个错误: AttributeError:“功能”对象没有属性“收集”。

您知道此错误的原因是什么吗?

0 个答案:

没有答案