我对Scala不是很好(我更像是一个R瘾君子)我希望使用Scala在两行中显示WrappedArray元素的内容(参见下面的sqlDf.show()
)在spark-shell
。我已经尝试了explode()
功能,但无法进一步......
scala> val sqlDf = spark.sql("select t.articles.donneesComptablesArticle.taxes from dau_temp t")
sqlDf: org.apache.spark.sql.DataFrame = [taxes: array<array<struct<baseImposition:bigint,codeCommunautaire:string,codeNatureTaxe:string,codeTaxe:string,droitCautionnable:boolean,droitPercu:boolean,imputationCreditCautionne:boolean,montantLiquidation:bigint,quotite:double,statutAi2:boolean,statutDeLiquidation:string,statutRessourcesPropres:boolean,typeTaxe:string>>>]
scala> sqlDf.show
16/12/21 15:13:21 WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
+--------------------+
| taxes|
+--------------------+
|[WrappedArray([12...|
+--------------------+
scala> sqlDf.printSchema
root
|-- taxes: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- baseImposition: long (nullable = true)
| | | |-- codeCommunautaire: string (nullable = true)
| | | |-- codeNatureTaxe: string (nullable = true)
| | | |-- codeTaxe: string (nullable = true)
| | | |-- droitCautionnable: boolean (nullable = true)
| | | |-- droitPercu: boolean (nullable = true)
| | | |-- imputationCreditCautionne: boolean (nullable = true)
| | | |-- montantLiquidation: long (nullable = true)
| | | |-- quotite: double (nullable = true)
| | | |-- statutAi2: boolean (nullable = true)
| | | |-- statutDeLiquidation: string (nullable = true)
| | | |-- statutRessourcesPropres: boolean (nullable = true)
| | | |-- typeTaxe: string (nullable = true)
scala> val sqlDfTaxes = sqlDf.select(explode(sqlDf("taxes")))
sqlDfTaxes: org.apache.spark.sql.DataFrame = [col: array<struct<baseImposition:bigint,codeCommunautaire:string,codeNatureTaxe:string,codeTaxe:string,droitCautionnable:boolean,droitPercu:boolean,imputationCreditCautionne:boolean,montantLiquidation:bigint,quotite:double,statutAi2:boolean,statutDeLiquidation:string,statutRessourcesPropres:boolean,typeTaxe:string>>]
scala> sqlDfTaxes.show()
16/12/21 15:22:28 WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
+--------------------+
| col|
+--------------------+
|[[12564,B00,TVA,A...|
+--------------------+
&#34;可读&#34;内容看起来像这样(这是我的GOA L:带标题的经典行x列结构显示):
codeTaxe codeCommunautaire baseImposition quotite montantLiquidation statutDeLiquidation
A445 B00 12564 20.0 2513 C
U165 A00 12000 4.7 564 C
codeNatureTaxe typeTaxe statutRessourcesPropres statutAi2 imputationCreditCautionne
TVA ADVAL FALSE TRUE FALSE
DD ADVAL TRUE FALSE TRUE
droitCautionnable droitPercu
FALSE TRUE
FALSE TRUE
并且每行的类是(使用R package sparklyr
找到它):
<jobj[100]>
class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
[12564,B00,TVA,A445,false,true,false,2513,20.0,true,C,false,ADVAL]
[[1]][[1]][[2]]
<jobj[101]>
class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
[12000,A00,DD,U165,false,true,true,564,4.7,false,C,true,ADVAL]
答案 0 :(得分:2)
你可以在每一栏上爆炸:
val flattenedtaxes = sqlDf.withColumn("codeCommunautaire", org.apache.spark.sql.functions.explode($"taxes. codeCommunautaire"))
在此之后,您的flattenedtaxes
将有2列税(所有列均按原样)新列codeCommunautaire