我有以下代码,用于计算事件列中出现的内容的次数。
val messages = KafkaUtils
.createDirectStream[String, String, StringDecoder, StringDecoder]
(ssc, kafkaParams, topicsSet)
messages
.map { case (_, jsonRating) =>
val format = Json.format[AmazonRating]
val jsValue = Json.parse(record)
format.reads(jsValue) match {
case JsSuccess(rating, _) => rating
case JsError(_) => AmazonRating.empty
}
.filter(_ != AmazonRating.empty)
.foreachRDD(_.foreachPartition(it => recommender.predict(it.toSeq)))
我希望能够像这样计算每一行的百分比。
SELECT event, count(event) as event_count
FROM event_information
group by event
event event_count
a 34
b 256
c 45
d 117
e 3
答案 0 :(得分:9)
SELECT event,
count(event) as event_count,
count(event) * 100.0 / (select count(*) from event_information) as event_percent
FROM event_information
group by event
答案 1 :(得分:8)
大多数SQL方言都支持ANSI标准窗口函数。因此,您可以将查询编写为:
select event, count(*) as event_count,
count(*) * 100.0/ sum(count(*)) over () as event_percent
from event_information
group by event;
窗口函数通常比子查询和其他方法更有效。