我有一个具有以下架构的数据框:
root
|-- Id: integer (nullable = true)
|-- Id_FK: integer (nullable = true)
|-- Foo: integer (nullable = true)
|-- Bar: string (nullable = true)
|-- XPTO: string (nullable = true)
从该数据框中,我想创建一个具有列名的JSON文件,并键入如下
{
"Id": "integer",
"Id_FK": "integer",
"Foo": "integer ",
"Bar": "string",
"XPTO": "string",
}
我正在尝试使用pyspark做到这一点,但是我找不到任何方法可以做到这一点。谁能帮我吗?
答案 0 :(得分:1)
这里是一个解决方案,该解决方案首先填充在架构各列之间迭代的字典。然后,我们使用json.dumps
将字典转换为字符串:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
import json
# sample schema
schema = StructType(
[
StructField("Id_FK" ,IntegerType()),
StructField("Foo" ,IntegerType()),
StructField("Bar" ,StringType()),
StructField("XPTO" ,StringType())
])
# create a dictionary where each item will be a pair of col_name : col_type
dict = {}
for c in schema:
dict[c.name] = str(c.dataType)
# convert to json string
data = json.dumps(dict)
# save to file
text_file = open("output.txt", "w")
text_file.write(data)
text_file.close()