如何在pyspark中访问架构的元数据?

时间:2018-09-18 23:27:07

标签: apache-spark pyspark

假设您具有这样的架构设置:

from pyspark.sql.types import StructField, StructType, IntegerType, StringType

schema = StructType([
    StructField(name='a_field', dataType=IntegerType(), nullable=False, metadata={'a': 'b'}),
    StructField(name='b_field', dataType=StringType(), nullable=True, metadata={'c': 'd'})
])

您将如何访问元数据?

1 个答案:

答案 0 :(得分:0)

您可以通过以下方式查看架构结构:

>>>schema.json()
'{"fields":[{"metadata":{"a":"b"},"name":"a_field","nullable":false,"type":"integer"},
            {"metadata":{"c":"d"},"name":"b_field","nullable":true,"type":"string"}],
  "type":"struct"}'

要访问元数据,只需遍历字段并访问元数据(字典)

>>>schema.fields[0].metadata['a']
'b'

>>> schema.fields[1].metadata['c']
'd'