Question

假设：

数据集：

+--------------------+
|               count|
+--------------------+
|                 1.0|
|                 2.0|
|                 3.0|
+--------------------+

代码：

String field = "count";    

Dataset<Row> histogram = dataset
    .groupBy(field)
    .count()
    .persist(StrorageLevel.MEMORY_ONLY_SER());

Column cnt = histogram.col("count"); // trying to get .count() result

直方图架构：

root
 |-- count: double (nullable = true) // input field `count`
 |-- count: long (nullable = false)  // .count() result

例外：

org.apache.spark.sql.AnalysisException: Reference 'count' is ambiguous, could be: count#101, count#108L.;

问题：

虽然我明白，为什么会发生这种情况，但我对如何解决这个问题没有任何想法。数据集是从数据库中的表创建的，可以包含任意数量的具有任何名称的列，包括count，avg和其他＆＃34;保留＆＃34;词语的

任何有用的帮助。

Answer 1

dataset.createOrReplaceTempView("V1");
dataset = spark.sql("select count as count_O from v1");
Dataset<Row>  histogram = dataset.groupBy("count_O").count().persist(StrorageLevel.MEMORY_ONLY_SER());
Column cnt = histogram.col("count");

AnalysisException：Reference＆＃39; count＆＃39;很暧昧

假设：

问题：

1 个答案: