我有一个计数,如下所示
My data in count as below
(Focus,37)
(Test,26)
我的代码如下。
for (i <- count ) {
for(x <- i {
if(x == "Focus"){
Focus_cnt=i(x) }
else if(x == "Test"){
Test_cnt=i(x) }
else {
pass
}
}
}
我面临的错误在第-for(x <-i,错误是i :(任意,任意)
在Spark Scala中获取计数的任何更好的方法。
答案 0 :(得分:0)
可以检查吗?让我知道我是否理解错误。 在此,我将应用过滤器并进行计数。
scala> Seq(("fa","fb","fc",5,"fe","ff","Focus"),("fba","fbb","fbc",16,"bd","be","Focus"),("fba","fbb","fbc",54,"bd","be","Focus"),("fca","fcb","fcc",135,"fcd","fef","Focus"),("a","b","c",5,"e","f","Test"),("aa","ba","ca",56,"ea","fa","Test"),("ab","cb","cc",35,"de","df","Test")).toDF("a","b","c","d","e","f","status")
res29: org.apache.spark.sql.DataFrame = [a: string, b: string ... 5 more fields]
scala> val df = Seq(("fa","fb","fc",5,"fe","ff","Focus"),("fba","fbb","fbc",16,"bd","be","Focus"),("fba","fbb","fbc",54,"bd","be","Focus"),("fca","fcb","fcc",135,"fcd","fef","Focus"),("a","b","c",5,"e","f","Test"),("aa","ba","ca",56,"ea","fa","Test"),("ab","cb","cc",35,"de","df","Test")).toDF("a","b","c","d","e","f","status")
df: org.apache.spark.sql.DataFrame = [a: string, b: string ... 5 more fields]
scala> val newDF = df.groupBy("status").agg(count("status").as("count"))
newDF: org.apache.spark.sql.DataFrame = [status: string, count: bigint]
scala> val focus_cnt = newDF.filter($"status" === "Focus").select("count").map(_.getAs[Long](0)).head
focus_cnt: Long = 4
scala> val test_cnt = newDF.filter($"status" === "Test").select("count").map(_.getAs[Long](0)).head
test_cnt: Long = 3