Question

我想在数据框中添加一个新的地图类型列，如下所示：

|-- cMap: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)

我尝试了代码：

df.withColumn("cMap", lit(null).cast(MapType)).printSchema

错误是：

<console>:132: error: overloaded method value cast with alternatives:
(to: String)org.apache.spark.sql.Column <and>
(to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column
cannot be applied to (org.apache.spark.sql.types.MapType.type)

是否有其他方法可以将新列强制转换为Map或MapType？感谢

Answer 1

我遇到了同样的问题，最后我找到了解决方案：

df.withColumn("cMap", typedLit(Map.empty[String, String]))

来自ScalaDocs typedLit：

此函数与[[lit]]之间的区别在于此函数可以处理参数化的scala类型，例如：List，Seq和Map。

Answer 2

与其他类型不同，import tensorflow as tf import numpy as np # Initialise some variables sess = tf.Session() x = tf.Variable(tf.truncated_normal([2, 4], stddev = 0.04)) z = tf.Variable(tf.truncated_normal([3, 4], stddev = 0.04)) sess.run(tf.variables_initializer([x, z])) # Enlarge the matrix by assigning it a new set of values sess.run(tf.assign(x, tf.concat((x, tf.cast(tf.truncated_normal([1, 4], stddev = 0.04), tf.float32)), 0), validate_shape=False)) # Print shapes of matrices, notice that x's actual shape is different for the # shape tensorflow has recorded for it print(x.get_shape()) print(x.eval(session=sess).shape) print(z.get_shape()) print(z.eval(session=sess).shape) # Add two matrices with equal shapes print(tf.add(x, z).eval(session=sess))不是您可以按原样使用的对象（它不是扩展MapType的对象），您必须调用{{1}期望键和值类型作为参数（并返回DataType 类的实例）：

MapType.apply(...)

Answer 3

您可以像其他答案一样使用Scala，也可以使用 stringified 类型的小技巧。

val withMapCol = df.withColumn("cMap", lit(null) cast "map<string, string>")
scala> withMapCol.printSchema
root
 |-- id: long (nullable = false)
 |-- cMap: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

您可以使用Spark SQL支持的任何类型（您可以在代码here中看到）：

dataType
    : complex=ARRAY '<' dataType '>'                            #complexDataType
    | complex=MAP '<' dataType ',' dataType '>'                 #complexDataType
    | complex=STRUCT ('<' complexColTypeList? '>' | NEQ)        #complexDataType
    | identifier ('(' INTEGER_VALUE (',' INTEGER_VALUE)* ')')?  #primitiveDataType

如何将空地图类型列添加到DataFrame？

3 个答案: