我有一个包含行序列的数据框。我想在不更改顺序的情况下逐行进行迭代。
我尝试下面的代码。
scala> val df = Seq(
| (0,"Load","employeeview", "employee.empdetails", null ),
| (1,"Query","employeecountview",null,"select count(*) from employeeview"),
| (2,"store", "employeecountview",null,null)
| ).toDF("id", "Operation","ViewName","DiectoryName","Query")
df: org.apache.spark.sql.DataFrame = [id: int, Operation: string ... 3 more fields]
scala> df.show()
+---+---------+-----------------+-------------------+--------------------+
| id|Operation| ViewName| DiectoryName| Query|
+---+---------+-----------------+-------------------+--------------------+
| 0| Load| employeeview|employee.empdetails| null|
| 1| Query|employeecountview| null|select count(*) f...|
| 2| store|employeecountview| null| null|
+---+---------+-----------------+-------------------+--------------------+
scala> val dfcount = df.count().toInt
dfcount: Int = 3
scala> for( a <- 0 to dfcount-1){
// first Iteration I want id =0 Operation="Load" ViewName="employeeview" DiectoryName="employee.empdetails" Query= null
// second iteration I want id=1 Operation="Query" ViewName="employeecountview" DiectoryName="null" Query= "select count(*) from employeeview"
// Third Iteration I want id= 2 Operation= "store" ViewName="employeecountview" DiectoryName="null" Query= "null"
//ignore below sample code
// val Operation = get(Operation(i))
// if (Operation=="Load"){
// based on operation type i am calling appropriate function and passing entire row as a parameter
// } else if(Operation= "Query"){
//
// } else if(Operation= "store"){
// }
}
注意:处理顺序不应更改。 (此处的唯一标识是ID,因此我们必须执行行0、1,2等)
先谢谢了。
答案 0 :(得分:1)
检查一下:
scala> val df = Seq(
| (0,"Load","employeeview", "employee.empdetails", null ),
| (1,"Query","employeecountview",null,"select count(*) from employeeview"),
| (2,"store", "employeecountview",null,null)
| ).toDF("id", "Operation","ViewName","DiectoryName","Query")
df: org.apache.spark.sql.DataFrame = [id: int, Operation: string ... 3 more fields]
scala> df.show()
+---+---------+-----------------+-------------------+--------------------+
| id|Operation| ViewName| DiectoryName| Query|
+---+---------+-----------------+-------------------+--------------------+
| 0| Load| employeeview|employee.empdetails| null|
| 1| Query|employeecountview| null|select count(*) f...|
| 2| store|employeecountview| null| null|
+---+---------+-----------------+-------------------+--------------------+
scala> val dfcount = df.count().toInt
dfcount: Int = 3
scala> :paste
// Entering paste mode (ctrl-D to finish)
for( a <- 0 to dfcount-1){
val operation = df.filter(s"id=${a}").select("Operation").as[String].first
operation match {
case "Query" => println("matching Query") // or call a function here for Query()
case "Load" => println("matching Load") // or call a function here for Load()
case "store" => println("matching store") //
case x => println("matched " + x )
}
}
// Exiting paste mode, now interpreting.
matching Load
matching Query
matching store
scala>
答案 1 :(得分:0)
这是我使用数据集的解决方案。这将提供类型安全性和更清洁的代码。但是必须对性能进行基准测试。它应该相差不大。
case class EmployeeOperations(id: Int, operation: String, viewName: String,DiectoryName: String, query: String)
val data = Seq(
EmployeeOperations(0, "Load", "employeeview", "employee.empdetails", ""),
EmployeeOperations(1, "Query", "employeecountview", "", "select count(*) from employeeview"),
EmployeeOperations(2, "store", "employeecountview", "", "")
)
val ds: Dataset[EmployeeOperations] = spark.createDataset(data)(Encoders.product[EmployeeOperations])
printOperation(ds).show
def printOperation(ds: Dataset[EmployeeOperations])={
ds.map(x => x.operation match {
case "Query" => println("matching Query"); "Query"
case "Load" => println("matching Load"); "Load"
case "store" => println("matching store"); "store"
case _ => println("Found something else") ;"Nothing"
}
)
}
为了测试目的,我在这里只返回了一个字符串。您可以返回任何原始类型。 这将返回:
scala> printOperation(ds).show
matching Load
matching Query
matching store
+-----+
|value|
+-----+
| Load|
|Query|
|store|
+-----+