火花Scala for循环再次for循环

时间:2020-04-23 14:16:03

标签: scala apache-spark

我有一个计数,如下所示

   My data in count as below
   (Focus,37)
   (Test,26)

我的代码如下。

         for (i <- count ) {
              for(x <- i {
                 if(x == "Focus"){
                     Focus_cnt=i(x) }
                        else if(x == "Test"){
                           Test_cnt=i(x) }
                           else {
            pass
        }
    }
}

我面临的错误在第-for(x <-i,错误是i :(任意,任意)

在Spark Scala中获取计数的任何更好的方法。

1 个答案:

答案 0 :(得分:0)

可以检查吗?让我知道我是否理解错误。 在此,我将应用过滤器并进行计数。

scala> Seq(("fa","fb","fc",5,"fe","ff","Focus"),("fba","fbb","fbc",16,"bd","be","Focus"),("fba","fbb","fbc",54,"bd","be","Focus"),("fca","fcb","fcc",135,"fcd","fef","Focus"),("a","b","c",5,"e","f","Test"),("aa","ba","ca",56,"ea","fa","Test"),("ab","cb","cc",35,"de","df","Test")).toDF("a","b","c","d","e","f","status")
res29: org.apache.spark.sql.DataFrame = [a: string, b: string ... 5 more fields]

scala> val df = Seq(("fa","fb","fc",5,"fe","ff","Focus"),("fba","fbb","fbc",16,"bd","be","Focus"),("fba","fbb","fbc",54,"bd","be","Focus"),("fca","fcb","fcc",135,"fcd","fef","Focus"),("a","b","c",5,"e","f","Test"),("aa","ba","ca",56,"ea","fa","Test"),("ab","cb","cc",35,"de","df","Test")).toDF("a","b","c","d","e","f","status")
df: org.apache.spark.sql.DataFrame = [a: string, b: string ... 5 more fields]

scala> val newDF = df.groupBy("status").agg(count("status").as("count"))
newDF: org.apache.spark.sql.DataFrame = [status: string, count: bigint]

scala> val focus_cnt = newDF.filter($"status" === "Focus").select("count").map(_.getAs[Long](0)).head
focus_cnt: Long = 4

scala> val test_cnt  = newDF.filter($"status" === "Test").select("count").map(_.getAs[Long](0)).head
test_cnt: Long = 3