Question

我有多个地图函数运行于同一数据上，我希望它们一次运行。我正在寻找一种通用的方法。

val fruits: Seq[String] = Seq("apple", "banana", "cherry")

def mapF(s: String): Char = s.head
def reduceF(c1: Char, c2: Char): Char = if(c1 > c2) c1 else c2

def mapG(s: String): Int = s.length
def reduceG(i1: Int, i2: Int): Int = i1 + i2

val largestStartingChar = fruits.map(mapF).reduce(reduceF)
val totalStringLength = fruits.map(mapG).reduce(reduceG)

我想减少通过fruits的次数。我可以为两张地图设置通用名称，并像这样简化：

def productMapFunction[A, B, C](f: A=>B, g: A=>C): A => (B, C) = {
  x => (f(x), g(x))
}

def productReduceFunction[T, U](f: (T, T)=>T, g: (U, U) => U):
    ((T,U), (T,U)) => (T, U) = {
  (tu1, tu2) => (f(tu1._1, tu2._1), g(tu1._2, tu2._2))
}

val xMapFG = productMapFunction(mapF, mapG)
val xReduceFG = productReduceFunction(reduceF, reduceG)

val (largestStartingChar2, totalStringLength2) = 
  fruits.map(xMapFG).reduce(xReduceFG))

我想更通用地使用任意数量的map和reduce函数来执行此操作，但是我不确定如何进行操作，或者是否可行。

Answer 1

有趣的问题！

我不知道标准库甚至scalaz / cats中的任何此类实现。这并不奇怪，因为如果列表不是很大，则可以按顺序执行map-reduce，而且我什至不知道构造许多中间对象的开销会比遍历列表的开销小。

如果该列表可能不适合内存，则应使用其中一种流媒体库（fs2 / zio-streams / akka-streams）

尽管如果您输入的是Iterator而不是List，则这种功能将很有用。

关于这个问题有一篇有趣的文章： https://softwaremill.com/beautiful-folds-in-scala/

tldr： Map-reduce工作流程可以如下形式化：

trait Fold[I, O] {
  type M
  def m: Monoid[M]

  def tally: I => M
  def summarize: M => O
}

在您的情况下，I = List[A]，tally = list => list.map(mapF)，summarize = list => list.reduce(reduceF)。

要使用list的实例在fold上运行map-reduce，您需要运行

fold.summarize(fold.tally(list))

您可以在它们上定义combine操作： def combine[I, O1, O2](f1: Fold[I, O1], f2: Fold[I, O2]): Fold[I, (O1, O2)]

几次使用combine会给您您想要的东西：

combine(combine(f1, f2), f3): Fold[I, ((O1, O2), O3)]

Answer 2

我认为您只是在尝试重新发明transducers。自从我使用Scala已经有一段时间了，但是至少有one implementation。

Answer 3

以下解决方案使用Cats 2和自定义类型MapReduce。

可以通过功能reduce: (O, O) => O指定减少操作或猫reducer: Semigroup[O]。 implicit def mapReduceApply[I]

提供的Apply实例可以将多个MapReduce对象合并为一个

import cats._
import cats.implicits._

trait MapReduce[I, O] {
  type R

  def reducer: Semigroup[R]

  def map: I => R

  def mapResult: R => O

  def apply(input: Seq[I]): O = mapResult(input.map(map).reduce(reducer.combine))
}

object MapReduce {
  def apply[I, O, _R](_reducer: Semigroup[_R], _map: I => _R, _mapResult: _R => O): MapReduce[I, O] =
    new MapReduce[I, O] {
      override type R = _R

      override def reducer = _reducer

      override def map = _map

      override def mapResult = _mapResult
    }

  def apply[I, O](map: I => O)(implicit r: Semigroup[O]): MapReduce[I, O] =
    MapReduce[I, O, O](r, map, identity)

  def apply[I, O](map: I => O, reduce: (O, O) => O): MapReduce[I, O] = {
    val reducer = new Semigroup[O] {
      override def combine(x: O, y: O): O = reduce(x, y)
    }
    MapReduce(map)(reducer)
  }

  implicit def mapReduceApply[I] =
    new Apply[({type F[X] = MapReduce[I, X]})#F] {
      override def map[A, B](f: MapReduce[I, A])(fn: A => B): MapReduce[I, B] =
        MapReduce(f.reducer, f.map, f.mapResult.andThen(fn))

      override def ap[A, B](ff: MapReduce[I, (A) => B])(fa: MapReduce[I, A]): MapReduce[I, B] =
        MapReduce(ff.reducer product fa.reducer,
          i => (ff.map(i), fa.map(i)),
          (t: (ff.R, fa.R)) => ff.mapResult(t._1)(fa.mapResult(t._2))
        )
    }

}

object MultiMapReduce extends App {

  val fruits: Seq[String] = Seq("apple", "banana", "cherry")

  def mapF(s: String): Char = s.head

  def reduceF(c1: Char, c2: Char): Char = if (c1 > c2) c1 else c2

  val biggestFirsChar = MapReduce(mapF, reduceF)
  val totalChars = MapReduce[String, Int](_.length) // (Semigroup[Int]) reduce by _ + _
  def count[A] = MapReduce[A, Int](_ => 1)

  val multiMapReduce = (biggestFirsChar, totalChars, count[String]).mapN((_, _, _))
  println(multiMapReduce(fruits))

  val sum = MapReduce[Double, Double](identity)
  val average = (sum, count[Double]).mapN(_ / _)
  println(sum(List(1, 2, 3, 4)))
  println(average(List(1, 2, 3, 4)))

}

可运行版本也可以在GitHub上获得。

是否可以在Scala中将多个映射和归约函数组合到一个通道中？

3 个答案: