spark group by not getting type, mismatch error

时间:2017-08-30 20:06:31

标签: scala apache-spark spark-dataframe

I am trying to get this variable ActiveCell.Formula = "=BDP(" & CName & ""& cusip"", ""security des"")" to have certain type: import matplotlib.pyplot as plt fig = plt.figure() ax1 = plt.subplot2grid((2, 2), (0, 0), colspan=2) ax1 = plt.subplot2grid((2, 2), (1, 1)) plt.show() is defined out of db connection select/collect statement which has 3 fields: 2 strings (import matplotlib.pyplot as plt fig1 = plt.figure() fig2 = plt.figure() ax1 = fig1.add_subplot(2, 2, 1) ax2 = fig1.add_subplot(2, 2, 4) plt.show() and GroupsByP) and an int (GroupsByP).

Expected result should be of the form p

id

my desired type for this variable is

order

but actual is Map[p,Set[(Id,Order)]]

1 个答案:

答案 0 :(得分:1)

如果我的问题是对的,那就应该这样做:

 val GroupsByP: Map[String, Set[(String, Int)]] = input.collect()
    .groupBy(_.p)
    .map(group => group._1 -> group._2.map(record => (record.Id, record.Order)).toSet)

您应该将每条记录映射到(Id, Order)元组。

非常相似但可能更具可读性的实现可能是:

val GroupsByP: Map[String, Set[(String, Int)]] = input.collect()
    .groupBy(_.p)
    .mapValues(_.map(record => (record.Id, record.Order)).toSet)