我有多个地图函数运行于同一数据上,我希望它们一次运行。我正在寻找一种通用的方法。
val fruits: Seq[String] = Seq("apple", "banana", "cherry")
def mapF(s: String): Char = s.head
def reduceF(c1: Char, c2: Char): Char = if(c1 > c2) c1 else c2
def mapG(s: String): Int = s.length
def reduceG(i1: Int, i2: Int): Int = i1 + i2
val largestStartingChar = fruits.map(mapF).reduce(reduceF)
val totalStringLength = fruits.map(mapG).reduce(reduceG)
我想减少通过fruits
的次数。我可以为两张地图设置通用名称,并像这样简化:
def productMapFunction[A, B, C](f: A=>B, g: A=>C): A => (B, C) = {
x => (f(x), g(x))
}
def productReduceFunction[T, U](f: (T, T)=>T, g: (U, U) => U):
((T,U), (T,U)) => (T, U) = {
(tu1, tu2) => (f(tu1._1, tu2._1), g(tu1._2, tu2._2))
}
val xMapFG = productMapFunction(mapF, mapG)
val xReduceFG = productReduceFunction(reduceF, reduceG)
val (largestStartingChar2, totalStringLength2) =
fruits.map(xMapFG).reduce(xReduceFG))
我想更通用地使用任意数量的map和reduce函数来执行此操作,但是我不确定如何进行操作,或者是否可行。
答案 0 :(得分:1)
有趣的问题!
我不知道标准库甚至scalaz / cats中的任何此类实现。 这并不奇怪,因为如果列表不是很大,则可以按顺序执行map-reduce,而且我什至不知道构造许多中间对象的开销会比遍历列表的开销小。
如果该列表可能不适合内存,则应使用其中一种流媒体库(fs2
/ zio-streams
/ akka-streams
)
尽管如果您输入的是Iterator
而不是List
,则这种功能将很有用。
关于这个问题有一篇有趣的文章: https://softwaremill.com/beautiful-folds-in-scala/
tldr: Map-reduce工作流程可以如下形式化:
trait Fold[I, O] {
type M
def m: Monoid[M]
def tally: I => M
def summarize: M => O
}
在您的情况下,I = List[A]
,tally = list => list.map(mapF)
,summarize = list => list.reduce(reduceF)
。
要使用list
的实例在fold
上运行map-reduce,您需要运行
fold.summarize(fold.tally(list))
您可以在它们上定义combine
操作:
def combine[I, O1, O2](f1: Fold[I, O1], f2: Fold[I, O2]): Fold[I, (O1, O2)]
几次使用combine
会给您您想要的东西:
combine(combine(f1, f2), f3): Fold[I, ((O1, O2), O3)]
答案 1 :(得分:1)
我认为您只是在尝试重新发明transducers。自从我使用Scala已经有一段时间了,但是至少有one implementation。
答案 2 :(得分:1)
以下解决方案使用Cats 2和自定义类型MapReduce。
可以通过功能reduce: (O, O) => O
指定减少操作
或猫reducer: Semigroup[O]
。
implicit def mapReduceApply[I]
import cats._
import cats.implicits._
trait MapReduce[I, O] {
type R
def reducer: Semigroup[R]
def map: I => R
def mapResult: R => O
def apply(input: Seq[I]): O = mapResult(input.map(map).reduce(reducer.combine))
}
object MapReduce {
def apply[I, O, _R](_reducer: Semigroup[_R], _map: I => _R, _mapResult: _R => O): MapReduce[I, O] =
new MapReduce[I, O] {
override type R = _R
override def reducer = _reducer
override def map = _map
override def mapResult = _mapResult
}
def apply[I, O](map: I => O)(implicit r: Semigroup[O]): MapReduce[I, O] =
MapReduce[I, O, O](r, map, identity)
def apply[I, O](map: I => O, reduce: (O, O) => O): MapReduce[I, O] = {
val reducer = new Semigroup[O] {
override def combine(x: O, y: O): O = reduce(x, y)
}
MapReduce(map)(reducer)
}
implicit def mapReduceApply[I] =
new Apply[({type F[X] = MapReduce[I, X]})#F] {
override def map[A, B](f: MapReduce[I, A])(fn: A => B): MapReduce[I, B] =
MapReduce(f.reducer, f.map, f.mapResult.andThen(fn))
override def ap[A, B](ff: MapReduce[I, (A) => B])(fa: MapReduce[I, A]): MapReduce[I, B] =
MapReduce(ff.reducer product fa.reducer,
i => (ff.map(i), fa.map(i)),
(t: (ff.R, fa.R)) => ff.mapResult(t._1)(fa.mapResult(t._2))
)
}
}
object MultiMapReduce extends App {
val fruits: Seq[String] = Seq("apple", "banana", "cherry")
def mapF(s: String): Char = s.head
def reduceF(c1: Char, c2: Char): Char = if (c1 > c2) c1 else c2
val biggestFirsChar = MapReduce(mapF, reduceF)
val totalChars = MapReduce[String, Int](_.length) // (Semigroup[Int]) reduce by _ + _
def count[A] = MapReduce[A, Int](_ => 1)
val multiMapReduce = (biggestFirsChar, totalChars, count[String]).mapN((_, _, _))
println(multiMapReduce(fruits))
val sum = MapReduce[Double, Double](identity)
val average = (sum, count[Double]).mapN(_ / _)
println(sum(List(1, 2, 3, 4)))
println(average(List(1, 2, 3, 4)))
}
可运行版本也可以在GitHub上获得。