模糊的Spark DataFrame模式-非JOINed方案

时间:2018-11-17 14:18:43

标签: apache-spark

鉴于我们可以在下面的数据框中看到相同的Col名称-正如我在其他地方提到的那样:

root
  |-- week: string (nullable = true)
  |-- dim1: integer (nullable = false)
  |-- dim2: string (nullable = true)
  |-- t1: integer (nullable = false)
  |-- t2: integer (nullable = false)
  |-- t3: integer (nullable = false)
  |-- t1: integer (nullable = false)
  |-- t2: integer (nullable = false)
  |-- t3: integer (nullable = false)
  |-- t1_diff: integer (nullable = false)
  |-- t2_diff: integer (nullable = false)

那:

df.select("t1").show(false) 

返回不明确的引用,那么我该如何指出要选择的引用呢?

这不是JOIN的结果,而只是基于具有.toDF(...)的Seq定义,如下所示:

val df = Seq(
         ("2016-04-02",14, null, 9784, 880, 23, 9789, 820, 45, -5, 60),
         ("2016-04-30",14, "FR", 9785,  13, 34, 9785,   9, 67, 90, 4),
         ("2016-04-16",14, "FR", 9785,  13, 34, 9785,   9, 67, -100, -123)
            ).toDF("week", "dim1", "dim2", "t1", "t2", "t3", "t1", "t2", "t3", "t1_diff", "t2_diff")

与我有些矛盾,不是我会做的事,但是我确实注意到了这一点,所以出于好奇而已。似乎是疏忽大意?

1 个答案:

答案 0 :(得分:0)

唯一的办法是重命名数据框的列。