替换Spark嵌套属性名称中的无效字符

时间:2019-06-28 21:49:50

标签: apache-spark schema parquet

这里有几篇关于在第一层处理无效字符,而不是多嵌套属性的文章

我的多嵌套模式遇到此错误

org.apache.spark.sql.AnalysisException: Attribute name "Foo Bar" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;

1 个答案:

答案 0 :(得分:0)

这是我在Scala中的解决方案

private val INVALID_ATTRIBUTE_CHARS = "[ ,;{}()\n\t=]"

def replaceBadAttriName(structType: StructType): StructType =
  StructType(structType.fields.map(cleanStructFld))

private def cleanStructFld(fld: StructField): StructField = {
  fld.dataType match {
    case struct: StructType =>
      StructField(fld.name, StructType(struct.map(cleanStructFld)), fld.nullable, fld.metadata)
    case _ =>
      val newName = fld.name.replaceAll(INVALID_ATTRIBUTE_CHARS, "_")
      StructField(newName, fld.dataType, fld.nullable, fld.metadata)
  }
}