如何在Scala Python和HiveQL中将数据创建为复杂的struct类型,struct数组类型?

时间:2019-06-10 14:07:54

标签: scala apache-spark struct pyspark rdd

如何创建Schema,DataFrame并加载适合该Schema的2-3行示例数据。 寻找Scala,Java和Python解决方案以及基于HiveQL的解决方案。

https://www.cloudera.com/documentation/enterprise/5-7x/topics/impala_struct.html 提供了创建镶木地板文件模式的方法,但是如何最好地插入结构数组

   |-- f2: string (nullable = true)
   |-- f3: array (nullable = true)
   |    |-- element: struct (containsNull = true)
   |    |    |-- f3_1: string (nullable = true)
   |    |    |-- f3_2: string (nullable = true)
   |    |    |-- f3_3: string (nullable = true)

标量代码

val schemaStruct = StructType(
    StructField("f1", StringType, true) ::
    StructField("f2", StringType, true) ::
    StructField("f3", ArrayType(StructType(
    StructField("f3_1", StringType, true) ::
    StructField("f3_2", StringType, true) ::
    StructField("f3_3", StringType, true) :: Nil), true)) ::Nil)

val smallRow = Row("f1","f2","f3", <Some Thing >)
val dfsmall = sparkContext.createDataFrame(sc.parallelize(smallRow::Nil,1), schemaStruct )

蜂巢QL

    CREATE TABLE struct_demo(
      id BIGINT,
      name STRING,
    -- A STRUCT as a top-level column. Demonstrates how the table ID column
    -- and the ID field within the STRUCT can coexist without a name conflict.
      employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,

    -- A STRUCT as the element type of an ARRAY.
      places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>,

    -- A STRUCT as the value portion of the key-value pairs in a MAP.
      memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,

    -- A STRUCT where one of the fields is another STRUCT.
      current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING >
    )
    STORED AS PARQUET;

如何插入..之类的

insert into struct_demo 12, "myName",  [ ["Street_1","NY_CITY","USA"],["Street_3","Chichago","USA"] ]  

0 个答案:

没有答案