Question

我使用的是graphframes，现在我正在使用聚合消息。顶点架构是：

 |-- id: long (nullable = false)
 |-- company: string (nullable = true)
 |-- money: integer (nullable = false)
 |-- memoryLearned: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = false)

如果我试试：

  ...
 def createMessage(memory: org.apache.spark.sql.Column): org.apache.spark.sql.Column = {
    memory + 10
  }

...

val msgToSrc: org.apache.spark.sql.Column = this.createMessage(AM.dst("id"))

val aggregates = gx
        .aggregateMessages
        .sendToSrc(msgToSrc)
        .agg(sum(AM.msg).as("aggMess"))
aggregates.show()

它有效！但我需要从memoryLearned中获取键和值，所以我认为它有效：

...
     def createMessage(memory: org.apache.spark.sql.Column): org.apache.spark.sql.Column = {
        for((k,v) <- memory)
           ...
      }


...

val msgToSrc: org.apache.spark.sql.Column = this.createMessage(AM.dst("memoryLearned"))

val aggregates = gx
        .aggregateMessages
        .sendToSrc(msgToSrc)
        .agg(myUDFA(AM.msg).as("aggMess"))
aggregates.show()

我收到此错误："value filter is not a member of org.apache.spark.sql.Column"

我试图搜索如何投射或获取MapType，但我只发现使用数据框爆炸等功能，但我没有df，我只有一列......

如果我把它：memory.getItem("aKeyFromMap")代替for(...，我会从地图中获得正确的值...

我也试图创建＆＃34; aux＆＃34;使用df函数将DataFrame导入createMessage（一行和一列），但是当我使用.withColumn("newColumn",memory)时，它会失败..

我被封锁了......有什么想法吗？

非常感谢!! 此致

Answer 1

如果您要迭代MapType Column，并且您事先不知道密钥，则必须对外部类型使用UDF或其他操作（像map）：

import org.apache.spark.sql.functions.udf

def createMessage = udf( (memory: Map[String, Integer]) => {
  for( (k,v) <- memory )
  ...
} )

你得到：

我收到此错误：＆＃34;值过滤器不是org.apache.spark.sql.Column＆＃34;
的成员

因为理解是map / flatMap / filter的语法糖。

如何从列获取MapType

1 个答案: