我尝试为我的新DataFrame创建架构,并尝试了各种括号和关键字组合,但却无法弄清楚如何使其工作。我目前的尝试:
from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
ArrayType(StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())]))
])
回过头来看错误:
elementType should be DataType
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__
assert isinstance(elementType, DataType), "elementType should be DataType"
AssertionError: elementType should be DataType
我用google搜索,但到目前为止还没有一个对象数组的好例子。
答案 0 :(得分:5)
StructField
属性需要额外ArrayType
。这应该工作:
from pyspark.sql.types import *
schema = StructType([
StructField("User", IntegerType()),
StructField("My_array", ArrayType(
StructType([
StructField("user", StringType()),
StructField("product", StringType()),
StructField("rating", DoubleType())
])
)
])
有关详细信息,请访问以下链接:http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/