Question

用例是读取文件并在其上创建一个数据框，然后获取该文件的架构并存储到数据库表中。

出于示例目的，我只是创建一个case类并获取printschema，但无法从中创建数据框

这是示例代码

case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)

val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate()

import spark.implicits._
val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
val Employee_DataFrame = EmployeesData.toDF
val dfschema = Employee_DataFrame.schema

现在dfschema是一种structype，想将其转换为两列的数据帧，如何实现该目标

Answer 1

火花> = 2.4.0

为了将架构保存为字符串格式，可以使用toDDL的{{1}}方法。在您的情况下，DDL格式应为：

StructType

保存模式后，您可以从数据库中加载模式并将其用作`Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT，这将返回StructType的实例，您可以使用该实例来创建带有StructType.fromDDL(my_schema)的新数据框，如已经提到的@Ajay

记住记住，您始终可以extract给出案例类的模式，这很有用：

spark.createDataFrame

然后您可以使用import org.apache.spark.sql.catalyst.ScalaReflection val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]获取DDL表示形式。

火花<2.4

对于Spark <2.4，请分别使用empSchema.toDDL和DataType.fromDDL。另外，除了返回schema.simpleString之外，还应使用StructType实例，将强制转换为StructType的内容省略为

DataType

empSchema.simpleString的示例输出：

val empSchema = ScalaReflection.schemaFor[Employee].dataType

Answer 2

尝试一下-

//-- For local file
val rdd = spark.read.option("wholeFile", true).option("delimiter",",").csv(s"file:///file/path/file.csv").rdd

val schema = StructType(Seq(StructField("Name", StringType, true),
                            StructField("Age", IntegerType, true),
                            StructField("Designation", StringType, true),
                            StructField("Salary", IntegerType, true),
                            StructField("ZipCode", IntegerType, true)))

val df = spark.createDataFrame(rdd,schema)

获取数据帧架构加载到元数据表

2 个答案: