如何创建Schema,DataFrame并加载适合该Schema的2-3行示例数据。 寻找Scala,Java和Python解决方案以及基于HiveQL的解决方案。
https://www.cloudera.com/documentation/enterprise/5-7x/topics/impala_struct.html 提供了创建镶木地板文件模式的方法,但是如何最好地插入结构数组
|-- f2: string (nullable = true)
|-- f3: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- f3_1: string (nullable = true)
| | |-- f3_2: string (nullable = true)
| | |-- f3_3: string (nullable = true)
标量代码
val schemaStruct = StructType(
StructField("f1", StringType, true) ::
StructField("f2", StringType, true) ::
StructField("f3", ArrayType(StructType(
StructField("f3_1", StringType, true) ::
StructField("f3_2", StringType, true) ::
StructField("f3_3", StringType, true) :: Nil), true)) ::Nil)
val smallRow = Row("f1","f2","f3", <Some Thing >)
val dfsmall = sparkContext.createDataFrame(sc.parallelize(smallRow::Nil,1), schemaStruct )
蜂巢QL
CREATE TABLE struct_demo(
id BIGINT,
name STRING,
-- A STRUCT as a top-level column. Demonstrates how the table ID column
-- and the ID field within the STRUCT can coexist without a name conflict.
employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,
-- A STRUCT as the element type of an ARRAY.
places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>,
-- A STRUCT as the value portion of the key-value pairs in a MAP.
memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,
-- A STRUCT where one of the fields is another STRUCT.
current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING >
)
STORED AS PARQUET;
如何插入..之类的
insert into struct_demo 12, "myName", [ ["Street_1","NY_CITY","USA"],["Street_3","Chichago","USA"] ]