来自csv文件的scala标头问题

时间:2018-10-08 16:23:26

标签: scala apache-spark

我正在尝试使用Scala和Apache Spark上传一个csv文件,但是,一旦使用Spark Structype指定了架构,我就会遇到这个问题,试图指出csv文件的标题-

scala> import org.apache.spark

import org.apache.spark


scala> import org.apache.spark.sql

import org.apache.spark.sql


scala> import org.apache.spark.sql.SQLContext

import org.apache.spark.sql.SQLContext


scala> import org.apache.spark.sql.types

import org.apache.spark.sql.types


scala> import org.apache.spark.sql.functions

import org.apache.spark.sql.functions


scala> import org.apache.spark.ml.clustering.KMeans

import org.apache.spark.ml.clustering.KMeans


scala> import org.apache.spark.ml.evaluation.ClusteringEvaluator

import org.apache.spark.ml.evaluation.ClusteringEvaluator


scala> import org.apache.spark.ml.feature.VectorAssembler

import org.apache.spark.ml.feature.VectorAssembler


scala> val sqlContext = new SQLContext(sc)

warning: there was one deprecation warning; re-run with -deprecation for details

sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@f24a84


scala> import sqlContext.implicits

import sqlContext.implicits


scala> import sqlContext

| val schema = StructType(Array(StructField("ID_CALLE",IntegerType,true),StructField("TIPO", IntegerType, true),StructField("CALLE",IntegerType,true),StructField("NUMERO",IntegerType,true), StructField("LONGITUD",DoubleType,true),StructField("LATITUD",DoubleType,true),StructField("TITULO",IntegerType,true)))

<console>:2: error: '.' expected but ';' found.

val schema = StructType(Array(StructField("ID_CALLE",IntegerType,true),StructField("TIPO", IntegerType, true),StructField("CALLE",IntegerType,true),StructField("NUMERO",IntegerType,true), StructField("LONGITUD",DoubleType,true),StructField("LATITUD",DoubleType,true),StructField("TITULO",IntegerType,true)))

1 个答案:

答案 0 :(得分:0)

您的代码中有小的拼写错误。如果您仔细查看代码,就会发现以下错误

scala> import sqlContext

| val schema = StructType(Array(StructField("ID_CALLE",IntegerType,true),StructField("TIPO", IntegerType, true),StructField("CALLE",IntegerType,true),StructField("NUMERO",IntegerType,true), StructField("LONGITUD",DoubleType,true),StructField("LATITUD",DoubleType,true),StructField("TITULO",IntegerType,true)))

您在每个地方都只在scala>之后键入新的代码行,但是在上面的代码中,您只是在|之后键入

所以只需键入以下代码

scala> import sqlContext._
scala> val schema = StructType(Array(StructField("ID_CALLE",IntegerType,true),StructField("TIPO", IntegerType, true),StructField("CALLE",IntegerType,true),StructField("NUMERO",IntegerType,true), StructField("LONGITUD",DoubleType,true),StructField("LATITUD",DoubleType,true),StructField("TITULO",IntegerType,true)))