Question

tl / dr：Kotlin如何使用groupingBy和聚合来获得一个（键，数字）对的序列来计算一个计数图？

我有30gb的csv文件，可以轻松阅读和解析。

File("data").walk().filter { it.isFile }.flatMap { file ->
    println(file.toString())
    file.inputStream().bufferedReader().lineSequence()
}. // now I have lines

每一行都是“key，extraStuff，matchCount”

.map { line ->
    val (key, stuff, matchCount) = line.split(",")
    Triple(key, stuff, matchCount.toInt())
}.

我可以过滤好的东西，因为很多东西被丢弃了 - yay lazy Sequences。（代码省略）

但是我需要一种懒惰的方法来获得最终的Map（key：String to count：Int）。

我认为我应该使用groupingBy和聚合，因为eachCount()只计算行数，而不是总结matchCount，而groupingBy是懒惰的而groupBy不是，但我们已达到我知道的结束。

.groupingBy { (key, _, _) ->
    key
}.aggregate { (key, _, matchCount) ->
    ??? something with matchCount ???
}

Answer 1

您可以使用{ "cause": { "name": "UsageError", "code": "E_INVALID_CRITERIA", "details": "Could not use the provided `where` clause. Could not filter by `date_type`: Unrecognized modifier (`start_date`) within provided constraint for `date_type`." }, "isOperational": true, "code": "E_INVALID_CRITERIA", "details": "Could not use the provided `where` clause. Could not filter by `date_type`: Unrecognized modifier (`start_date`) within provided constraint for `date_type`." }扩展名代替yearly_average[-20:].plot(x='year', y='rating', figsize=(15,10), grid=True)。它更适合按特定属性对分组条目求和：

Grouping.fold

Answer 2

您需要将包含四个参数的函数传递给aggregate：

@param operation：使用以下参数在每个元素上调用函数：


key：此元素所属的组的键;

accumulator：组的累加器的当前值，如果是组中遇到的第一个null，则可以是element;

element：来自汇总来源的元素;

first：表示它是否是该组中遇到的第一个element。

其中，您需要accumulator和element（您可以解构）。代码是：

.groupingBy { (key, _, _) -> key }
.aggregate { _, acc: Int?, (_, _, matchCount), _ ->
    (acc ?: 0) + matchCount 
}

Kotlin与groupingBy和聚合相结合

2 个答案: