我的DataFrame root
|-- author: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- client: array (nullable = true)
| |-- element: integer (containsNull = true)
|-- outbound_link: array (nullable = true)
| |-- element: string (containsNull = true)
|-- url: string (nullable = true)
具有以下结构:
val sourceField = "outbound_link" // set automatically
val targetField = "url" // set automatically
val nodeId = "client" // set automatically
val result = df.as("df1").join(df.as("df2"),
$"df1."+sourceField === $"df2."+targetField
).groupBy(
($"df1."+nodeId).as("nodeId_1"),
($"df2."+nodeId).as("nodeId_2")
)
.agg(
count("*") as "value", max($"df1."+timestampField) as "timestamp"
)
.toDF("source", "target", "value", "timestamp")
我运行此代码:
Exception in thread "main" org.apache.spark.sql.AnalysisException: syntax error in attribute name: df1.;
但我收到错误:
sourceField
由于某些原因,变量targetField
和join
在 handleClick() {
var name = this.name.value;
console.log(name);
}
操作中不可见。这些变量不为空,包含字段名称。我必须使用变量,因为我在上一步代码中自动定义它们。
答案 0 :(得分:2)
确实是一个有趣的案例。请查看$"df1."+sourceField
并考虑何时将$"df1."
转换为Column
与"df1."+sourceField
的串联。
scala> val sourceField = "id"
sourceField: String = id
scala> $"df1."+sourceField
org.apache.spark.sql.AnalysisException: syntax error in attribute name: df1.;
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:151)
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:170)
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:142)
at org.apache.spark.sql.Column.<init>(Column.scala:137)
at org.apache.spark.sql.ColumnName.<init>(Column.scala:1203)
at org.apache.spark.sql.SQLImplicits$StringToColumn.$(SQLImplicits.scala:45)
... 55 elided
将$"df1."+sourceField
替换为使用col
或column
函数,您应该没问题。
scala> col(s"df1.$sourceField")
res7: org.apache.spark.sql.Column = df1.id