我有一个包含以下架构的数据框:
root
|-- _1: struct (nullable = true)
| |-- key: string (nullable = true)
|-- _2: struct (nullable = true)
| |-- value: long (nullable = true)
我想将数据帧转换为以下架构:
root
|-- _1: struct (nullable = true)
| |-- key: string (nullable = true)
| |-- value: long (nullable = true)
答案 0 :(得分:2)
使用struct
:
pyspark.sql.functions.struct(*cols)
创建一个新的结构列。
from pyspark.sql.functions import struct, col
from pyspark.sql import Row
df = spark.createDataFrame([Row(_1=Row(key="a"), _2=Row(value=1))])
result = df.select(struct(col("_1.key"), col("_2.value")).alias("_1"))
给出:
result.printSchema()
# root
# |-- _1: struct (nullable = false)
# | |-- key: string (nullable = true)
# | |-- value: long (nullable = true)
和
result.show()
# +-----+
# | _1|
# +-----+
# |[a,1]|
# +-----+
答案 1 :(得分:2)
如果您的<div id="app">
<div v-for="t in team" v-bind:key="t.id" v-bind:author="t.author">
{{t.author}}
<div v-for="m in t.members" v-bind:key="m.id">
{{m.name}}
</div>
</div>
</div>
有以下dataframe
schema
然后,您可以使用root
|-- _1: struct (nullable = true)
| |-- key: string (nullable = true)
|-- _2: struct (nullable = true)
| |-- value: long (nullable = true)
选择 struct 列的所有元素到单独的列中,然后使用{{1} } 内置函数将它们组合回一个 struct 字段
*
您应该获得所需的输出struct
from pyspark.sql import functions as F
df.select(F.struct("_1.*", "_2.*").alias("_1"))
<强>更新强>
如果原始dataframe
中的所有列都是 struct ,则上述代码的更通用形式如下所示
root
|-- _1: struct (nullable = false)
| |-- key: string (nullable = true)
| |-- value: long (nullable = true)