Question

在下面的sql中，需要使用语法来访问嵌套的struct。

第三行上具体如下：

  collect_list(struct( .. ) )

我放了rec.*，但这当然不是正确的方法。

select matchMethod, rec.* from
                              (select first(matchMethod) matchMethod,
                                collect_list(struct(rawTp,tp,fp,fn,
                                        precision,recall,weight,F1,
                                        truthGrpId,entityId,
                                        tpIds,fpIds, fnIds,truthIds,actuals)) rec
                                   from scoring5
                                      where entityId is not null and truthGrpId is not null
                                  group by truthGrpId
                              ) order by rec.truthGrpId, rec.recall desc

结果为：

org.apache.spark.sql.AnalysisException: 
Can only star expand struct data types. Attribute: `ArrayBuffer(rec)`;

尝试了许多其他方法。我还仔细阅读过SOF上的其他十个问题，但没有一个是专门针对SQL而不是DSL的直接解决的。这完全可能吗？

我不确定消息Can only star expand struct data types是否意味着可能有不同的语法来实现此目的，或者在这里spark sql是否只是有缺陷。

我们正在使用spark 2.3.X。

Answer 1

考虑到语法的各种组合的大量研究和试验，我倾向于与@ user6910411达成共识，即上述内容目前不受支持。似乎以Spark 2.4的形式出现了一些帮助：请参阅Jacek Laskowski的答案：

无论如何，我发现使用windowing函数的一种更简单的方法如下：

select * from
  (select row_number() over (partition by truthGrpId order by recall desc) rownum,*
    from
    (select matchMethod, rawTp,tp,fp,fn,
        precision,recall,weight,F1,
        truthGrpId,entityId,
        tpIds,fpIds, fnIds,truthIds,actuals
      from scoring5
      where entityId is not null and truthGrpId is not null
    ) order by truthGrpId, recall desc
  ) where rownum=1 order by truthGrpId""")

这里明显的后续操作是更深入地研究windowing函数，并将它们作为一等公民纳入我的探索性工作中。

如何从sql（而非DSL）访问Spark嵌套的结构字段

1 个答案: