我已从kafka源读取记录到indexPath
spark数据帧。
我想从private var people: [Person] {
return Array(database.people).sortedArray(using: Person.defaultSortDescriptors)
}
private func person(at indexPath: IndexPath) -> Person {
return people[indexPath.item]
}
中选择一些列并进行一些操作。因此,要检查我是否获得正确的索引,我尝试在语句mydataframe
中打印索引,如下所示:
row
但是上面的代码说该行只有一个名称println(row.getFieldIndex(pathtoDesiredColumnFromSchema))
,没有列名val pathtoDesiredColumnFromSchema = "data.root.column1.column2.field"
val myQuery = mydataframe.writeStream.foreach(new ForeachWriter[Row]() {
override def open(partitionId: Long, version: Long): Boolean = true
override def process(row: Row): Unit = {
println(row.getFieldIndex(pathtoDesiredColumnFromSchema))
}
override def close(errorOrNull: Throwable): Unit = {}
}).outputMode("append").start()
。
通过名称路径从spark sql行获取列值的正确方法是什么?
答案 0 :(得分:2)
您可以对getAs
类型使用struct
调用链,例如:
val df = spark.range(1,5).toDF.withColumn("time", current_timestamp())
.union(spark.range(5,10).toDF.withColumn("time", current_timestamp()))
.groupBy(window($"time", "1 millisecond")).count
df.printSchema
root
|-- window: struct (nullable = true)
| |-- start: timestamp (nullable = true)
| |-- end: timestamp (nullable = true)
|-- count: long (nullable = false)
df.take(1).head
.getAs[org.apache.spark.sql.Row]("window")
.getAs[java.sql.Timestamp]("start")
希望有帮助!
答案 1 :(得分:0)
如果您只想打印DataFrame
的字段,就可以使用
mydataframe.select(pathtoDesiredColumnFromSchema).foreach(println(_.get(0)))