我正在尝试将master拆分为子数据帧,同时拆分主数据帧我只得到一个masterDF列,而我正在尝试拆分成多列。
ChildDF=
K0059122016022YU165754000000 000100000 L0000026009011 00020000 00007020149600001050000000N
K0059122016022YU100000000000 000200000 90800035433174 00010000 00009390150200001410000000N
K0059122016022YU160000000000 000100000 90800034921015 000100000000000000000014600000000000000000000N
K0059122016022YU165752000000 000100000 90800028370118 00020000 00011110000000000000000000N
K0059122016022YU100000000000 920161206083824VS122400000000000000000000000000000000000000020161206083824
K0059122016022YU165000000000 0001IVASQ S0000931025555 00020000 00004460000000000000000000N
listIs=List(Map(type->A,value1->1,value2->1),Map(type->B,value1->2,value2->6),Map(type->C,value1->8,value2->7),Map(type->D,value1->15,value2->2),Map(type->E,value1->17,value2->8),Map(type->F,value1->25,value2->8))
listIs.foreach(iteam =>
ChildDF.withColumn(iteam("type"),substring(ChildDF("masterDF"),iteam("value1").asInstanceOf[Int],iteam("value2").asInstanceOf[Int]))
)
ChildDF.createOrReplaceTempView("ChildTable")
val queryDF = "SELECT * from ChildTable"
sparkSession.sql(queryDF).cache().toDF().show()
输出
masterDF
K0059122016022YU165754....
K0059122016022YU100000....
K0059122016022YU160000....
K0059122016022YU165752....
K0059122016022YU100000....
K0059122016022YU165000....
预期输出(XXXXXX是分割值)
masterDF A B C
K0059122016022YU165754.... XXXXXX XXXXXX XXXXXX
K0059122016022YU100000.... XXXXXX XXXXXX XXXXXX
K0059122016022YU160000.... XXXXXX XXXXXX XXXXXX
K0059122016022YU165752.... XXXXXX XXXXXX XXXXXX
K0059122016022YU100000.... XXXXXX XXXXXX XXXXXX
K0059122016022YU165000.... XXXXXX XXXXXX XXXXXX
答案 0 :(得分:0)
使用地图而不是foreach。 withColumn
将生成新的数据框。
val newChildDF = listIs.map(iteam =>
ChildDF.withColumn(iteam("type"),substring(ChildDF("masterDF"),iteam("value1").asInstanceOf[Int],iteam("value2").asInstanceOf[Int]))
)
newChildDF.createOrReplaceTempView("ChildTable")
val queryDF = "SELECT * from ChildTable"
sparkSession.sql(queryDF).cache().toDF().show()