我在pyspark中使用了df.printSchema()
,它给了我带树结构的模式。现在我需要将其保存在变量或文本文件中。
我尝试过以下保存方法,但他们没有工作。
v = str(df.printSchema())
print(v)
#and
df.printSchema().saveAsTextFile(<path>)
我需要以下格式保存的架构
|-- COVERSHEET: struct (nullable = true)
| |-- ADDRESSES: struct (nullable = true)
| | |-- ADDRESS: struct (nullable = true)
| | | |-- _VALUE: string (nullable = true)
| | | |-- _city: string (nullable = true)
| | | |-- _primary: long (nullable = true)
| | | |-- _state: string (nullable = true)
| | | |-- _street: string (nullable = true)
| | | |-- _type: string (nullable = true)
| | | |-- _zip: long (nullable = true)
| |-- CONTACTS: struct (nullable = true)
| | |-- CONTACT: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- _VALUE: string (nullable = true)
| | | | |-- _name: string (nullable = true)
| | | | |-- _type: string (nullable = true)
答案 0 :(得分:2)
您需要SELECT max( cast(avg as unsigned) ) as avg FROM `abcd`
SELECT min( cast(avg as unsigned) ) as avg FROM `abcd`
(由于某种原因,我无法在python API中找到)
treeString
您可以将其转换为RDD并使用#v will be a string
v = df._jdf.schema().treeString()
saveAsTextFile
或者使用特定于Python的API将String写入文件。
答案 1 :(得分:1)
您还可以使用以下内容:
temp_rdd = sc.parallelize(schema)
temp_rdd.coalesce(1).saveAsPickleFile("s3a://path/to/destination_schema.pickle")