nTypeError:无法合并类型<class \'pyspark.sql.types.doubletype \'=“”>和<class \'pyspark.sql.types.stringtype \'=“”> \ n'

时间:2017-12-19 14:49:08

标签: pandas pyspark databricks

我正在使用spark 1.6,我正在运行以下代码:

def load(self, filename):        
    print "Loading input file " + filename
    inputpd = pd.read_csv('input/'+filename , dtype=str)

    inputpd = inputpd.round(4)
    inputpd = inputpd.drop(inputpd.columns[[0]], axis=1)
    df_input = self.sqlContext.createDataFrame(inputpd)
    return df_input

运行代码后,我收到类型错误:

  

Can not merge type <class \'pyspark.sql.types.DoubleType\'> and <class \'pyspark.sql.types.StringType\'>\n'

要解决此问题,我已尝试过:

inputpd = (spark.read.format("csv").options(header="true").load('input/'+filename))

还有:

inputpd = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load('input/'+filename)

但两次我都得到错误spark或sqlcontext没有定义。请告诉我如何定义因为我已经尝试过import语句。

1 个答案:

答案 0 :(得分:0)

我尝试过pyspark并且工作......

pyspark --packages com.databricks:spark-csv_2.11:1.5.0

this one

sqlContext.read.format('com.databricks.spark.csv').options(header='true').load("file:/home/..." + filename).collect()