Question

我开始学习Elixir并遇到了一个我无法轻易解决的挑战。

我尝试创建一个带有Enumerable.t的函数，然后返回另一个包含下一个 n 项的Enumerable.t。它与Enum.chunk（e，n，1，[]）的行为略有不同，因为数字迭代计数总是等于原始的可枚举计数。我还需要支持Streams

@spec lookahead(Enumerable.t, non_neg_integer) :: Enumerable.t

使用doctest语法最好地说明了这一点：

iex> lookahead(1..6, 1) |> Enum.to_list
[[1,2],[2,3],[3,4],[4,5],[5,6],[6]]

iex> lookahead(1..4, 2) |> Enum.to_list
[[1,2,3],[2,3,4],[3,4],[4]]

iex> Stream.cycle(1..4) |> lookahead(2) |> Enum.take(5)
[[1,2,3],[2,3,4],[3,4,1],[4,1,2],[1,2,3]]

iex> {:ok,io} = StringIO.open("abcd")
iex> IO.stream(io,1) |> lookahead(2) |> Enum.to_list
[["a","b","c"],["b","c","d"],["c","d"],["d"]]

我已经研究过实现Enumerable.t协议，但还没有完全理解Enumerable.reduce接口。

有没有简洁/优雅的方式呢？

我的用例是二进制流上的一个小的固定n值（1或2），因此优化版本需要额外的点数。但是，为了学习Elixir，我对多个用例的解决方案感兴趣。表现很重要。我将为解决方案的各种n值运行一些基准测试并发布。

基准更新 - 2015年4月8日

已发布了6个可行的解决方案。有关基准的详细信息，请访问https://gist.github.com/spitsw/fce5304ec6941578e454。基准测试在一个列表中运行，其中有500个项目用于各种n值。

对于n = 1，得到以下结果：

PatrickSuspend.lookahead    104.90 µs/op
Warren.lookahead            174.00 µs/op
PatrickChunk.lookahead      310.60 µs/op
PatrickTransform.lookahead  357.00 µs/op
Jose.lookahead              647.60 µs/op
PatrickUnfold.lookahead     1484000.00 µs/op

对于n = 50，结果如下：

PatrickSuspend.lookahead    220.80 µs/op
Warren.lookahead            320.60 µs/op
PatrickTransform.lookahead  518.60 µs/op
Jose.lookahead              1390.00 µs/op
PatrickChunk.lookahead      3058.00 µs/op
PatrickUnfold.lookahead     1345000.00 µs/op (faster than n=1)

Answer 1

正如评论中所讨论的，我的第一次尝试遇到了一些性能问题，并且没有使用具有副作用的流，例如IO流。我花时间深入挖掘了流库，最后提出了这个解决方案：

defmodule MyStream
  def lookahead(enum, n) do
    step = fn val, _acc -> {:suspend, val} end
    next = &Enumerable.reduce(enum, &1, step)
    &do_lookahead(n, :buffer, [], next, &1, &2)
  end

  # stream suspended
  defp do_lookahead(n, state, buf, next, {:suspend, acc}, fun) do
    {:suspended, acc, &do_lookahead(n, state, buf, next, &1, fun)}
  end

  # stream halted
  defp do_lookahead(_n, _state, _buf, _next, {:halt, acc}, _fun) do
    {:halted, acc}
  end

  # initial buffering
  defp do_lookahead(n, :buffer, buf, next, {:cont, acc}, fun) do
    case next.({:cont, []}) do
      {:suspended, val, next} ->
        new_state = if length(buf) < n, do: :buffer, else: :emit
        do_lookahead(n, new_state, buf ++ [val], next, {:cont, acc}, fun)
      {_, _} ->
        do_lookahead(n, :emit, buf, next, {:cont, acc}, fun)
    end
  end

  # emitting
  defp do_lookahead(n, :emit, [_|rest] = buf, next, {:cont, acc}, fun) do
    case next.({:cont, []}) do
      {:suspended, val, next} ->
        do_lookahead(n, :emit, rest ++ [val], next, fun.(buf, acc), fun)
      {_, _} ->
        do_lookahead(n, :emit, rest, next, fun.(buf, acc), fun)
    end
  end

  # buffer empty, halting
  defp do_lookahead(_n, :emit, [], _next, {:cont, acc}, _fun) do
    {:halted, acc}
  end
end

起初看起来可能令人生畏，但实际上并不那么难。我会尝试为你分解，但对于这样一个完整的例子来说，这很难。

让我们从一个更简单的例子开始：一个无休止地重复给定值的流。为了发出流，我们可以返回一个函数，它接受一个累加器和一个函数作为参数。要发出一个值，我们用两个参数调用该函数：要发出的值和累加器。 acc累加器是一个由命令（:cont，:suspend或:halt组成）的元组，告诉我们消费者希望我们做什么;我们需要返回的结果取决于操作。如果应该暂停流，我们返回原子:suspended的三元素元组，累加器和枚举继续时将被调用的函数（有时称为＆＃34; continuation＆＃34;）。对于:halt命令，我们只返回{:halted, acc}，对于:cont，我们通过执行如上所述的递归步骤来发出值。整个事情看起来像这样：

defmodule MyStream do
  def repeat(val) do
    &do_repeat(val, &1, &2)
  end

  defp do_repeat(val, {:suspend, acc}, fun) do
    {:suspended, acc, &do_repeat(val, &1, fun)}
  end

  defp do_repeat(_val, {:halt, acc}, _fun) do
    {:halted, acc}
  end

  defp do_repeat(val, {:cont, acc}, fun) do
    do_repeat(val, fun.(val, acc), fun)
  end
end

现在这只是这个难题的一部分。我们可以发出一个流，但我们还没有处理传入的流。再次，为了解释它是如何工作的，构造一个更简单的例子是有意义的。在这里，我将构建一个函数，它接受一个可枚举的函数，只是暂停并重新发出每个值。

defmodule MyStream do
  def passthrough(enum) do
    step = fn val, _acc -> {:suspend, val} end
    next = &Enumerable.reduce(enum, &1, step)
    &do_passthrough(next, &1, &2)
  end

  defp do_passthrough(next, {:suspend, acc}, fun) do
    {:suspended, acc, &do_passthrough(next, &1, fun)}
  end

  defp do_passthrough(_next, {:halt, acc}, _fun) do
    {:halted, acc}
  end

  defp do_passthrough(next, {:cont, acc}, fun) do
    case next.({:cont, []}) do
      {:suspended, val, next} ->
        do_passthrough(next, fun.(val, acc), fun)
      {_, _} ->
        {:halted, acc}
    end
  end
end

第一个子句设置传递给next函数的do_passthrough函数。它用于从传入流中获取下一个值。内部使用的步骤函数定义我们为流中的每个项暂停。除了最后一个条款外，其余部分非常相似。在这里，我们用{:cont, []}调用下一个函数来获取一个新值，并通过case语句处理结果。如果有值，我们返回{:suspended, val, next}，否则，流停止，我们将其传递给消费者。

我希望澄清一些关于如何手动在Elixir中构建流的内容。不幸的是，使用流需要大量的样板。如果你现在回到lookahead实现，你会发现只有微小的差异，这些是真正有趣的部分。还有两个参数：state，用于区分:buffer和:emit步骤，buffer预先填充n+1个项目最初的缓冲步骤。在发射阶段，发射当前缓冲区，然后在每次迭代时向左移位。当输入流停止或我们的流直接停止时，我们就完成了。

我将原来的答案留在这里作为参考：

这是一个使用Stream.unfold/2发出真正的值流的解决方案根据您的规格。这意味着您需要添加Enum.to_list 前两个示例的结尾，以获取实际值。

defmodule MyStream do
  def lookahead(stream, n) do
    Stream.unfold split(stream, n+1), fn
      {[], stream} ->
        nil
      {[_ | buf] = current, stream} ->
        {value, stream} = split(stream, 1)
        {current, {buf ++ value, stream}}
    end
  end

  defp split(stream, n) do
    {Enum.take(stream, n), Stream.drop(stream, n)}
  end
end

一般的想法是我们保持前面的迭代的buf。在每次迭代中，我们发出当前的buf，从流中获取一个值并将其附加到buf的末尾。重复此过程直到buf为空。

示例：

iex> MyStream.lookahead(1..6, 1) |> Enum.to_list
[[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6]]

iex> MyStream.lookahead(1..4, 2) |> Enum.to_list
[[1, 2, 3], [2, 3, 4], [3, 4], [4]]

iex> Stream.cycle(1..3) |> MyStream.lookahead(2) |> Enum.take(5)
[[1, 2, 3], [2, 3, 1], [3, 1, 2], [1, 2, 3], [2, 3, 1]]

Answer 2

以下是此类功能的低效实现：

defmodule Lookahead do
  def lookahead(enumerable, n) when n > 0 do
    enumerable
    |> Stream.chunk(n + 1, 1, [])
    |> Stream.flat_map(fn list ->
        length = length(list)
        if length < n + 1 do
          [list|Enum.scan(1..n-1, list, fn _, acc -> Enum.drop(acc, 1) end)]
        else
          [list]
        end
      end)
  end
end

它建立在@ hahuang65实现的基础之上，除了我们使用Stream.flat_map/2来检查每个发射项的长度，一旦我们检测到发出的项变得更短，就添加缺少的项。

从头开始的手写实现会更快，因为我们不需要在每次迭代时调用length(list)。如果n很小，上面的实现可能会很好。如果n是固定的，您甚至可以明确地在生成的列表上进行模式匹配。

Answer 3

I had started a discussion about my proposed Stream.mutate method on the elixir core mailing list，彼得·汉密尔顿提出了解决这个问题的另一种方法。通过使用make_ref to create a globally unique reference，我们可以创建填充流并将其与原始可枚举连接，以在原始流停止后继续发出。然后，这可以与Stream.chunk一起使用，这意味着我们需要在最后一步中删除不需要的引用：

def lookahead(enum, n) do
  stop = make_ref
  enum
  |> Stream.concat(List.duplicate(stop, n))
  |> Stream.chunk(n+1, 1)
  |> Stream.map(&Enum.reject(&1, fn x -> x == stop end))
end

从语法的角度来看，我认为这是最漂亮的解决方案。或者，我们可以使用Stream.transform手动构建缓冲区，这与我之前提出的手动解决方案非常相似：

def lookahead(enum, n) do
  stop = make_ref
  enum
  |> Stream.concat(List.duplicate(stop, n+1))
  |> Stream.transform([], fn val, acc ->
    case {val, acc} do
      {^stop, []}                         -> {[]   , []           }
      {^stop, [_|rest] = buf}             -> {[buf], rest         }
      {val  , buf} when length(buf) < n+1 -> {[]   , buf ++ [val] }
      {val  , [_|rest] = buf}             -> {[buf], rest ++ [val]}
    end
  end)
end

我没有对这些解决方案进行基准测试，但我认为第二个解决方案虽然略显笨拙，但应该执行得更好一点，因为它不必迭代每个块。

顺便说一句，第二个解决方案可以在没有case语句once Elixir allows to use the pin operator in function heads (probably in v1.1.0)的情况下编写：

def lookahead(enum, n) do
  stop = make_ref
  enum
  |> Stream.concat(List.duplicate(stop, n+1))
  |> Stream.transform([], fn
    ^stop, []                         -> {[]   , []           }
    ^stop, [_|rest] = buf             -> {[buf], rest         }
    val  , buf when length(buf) < n+1 -> {[]   , buf ++ [val] }
    val  , [_|rest] = buf             -> {[buf], rest ++ [val]}
  end)
end

Answer 4

您应该可以使用Stream.chunk / 4

看起来像这样：

defmodule MyMod do
  def lookahead(enum, amount) do
    Stream.chunk(enum, amount + 1, 1, [])
  end
end

输入：

iex(2)> MyMod.lookahead(1..6, 1) |> Enum.to_list
[[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6]]

iex(3)> MyMod.lookahead(1..4, 2) |> Enum.to_list
[[1, 2, 3], [2, 3, 4], [3, 4]]

iex(4)> Stream.cycle(1..3) |> MyMod.lookahead(1) |> Enum.take(5)
[[1, 2], [2, 3], [3, 1], [1, 2], [2, 3]]

Answer 5

以下解决方案使用Stream.resource和Enumerable.reduce的挂起功能。所有的例子都通过了。

简而言之，它使用Enumerable.reduce来构建列表。然后它会在每次迭代时挂起reducer，删除列表的头部，并在列表的尾部添加最新的项目。最后，当reducer为：done或：halted时，它会生成流的尾部。所有这些都是使用Stream.resource协调的。

如果使用FIFO队列代替每次迭代的列表，这将更有效。

请提供任何简化，效率或错误的反馈

def Module
  def lookahead(enum, n) when n >= 0 do
    reducer = fn -> Enumerable.reduce(enum, {:cont, {0, []}}, fn
      item, {c, list} when c < n  -> {:cont, {c+1, list ++ [item]}} # Build up the first list
      item, {c, list} when c == n -> {:suspend, {c+1, list ++ [item]}} # Suspend on first full list
      item, {c, [_|list]} -> {:suspend, {c, list ++ [item]}} # Remove the first item and emit
      end)
    end

    Stream.resource(reducer,
      fn
        {:suspended, {_, list} = acc , fun} -> {[list], fun.({:cont, acc})}
        {:halted, _} = result -> lookahead_trail(n, result) # Emit the trailing items
        {:done, _} = result -> lookahead_trail(n, result) # Emit the trailing items
      end,
      fn
        {:suspended, acc, fun} -> fun.({:halt, acc}) # Ensure the reducer is halted after suspend
        _ ->
      end)
  end

  defp lookahead_trail(n, acc) do
    case acc do
      {action, {c, [_|rest]}} when c > n -> {[], {action, {c-1, rest}}} # List already emitted here
      {action, {c, [_|rest] = list}} -> {[list], {action, {c-1, rest}}} # Emit the next tail item
      acc -> {:halt, acc } # Finish of the stream
    end
  end
end

Answer 6

从沃伦那里汲取灵感之后，我做到了这一点。基本用法：

ex> {peek, enum} = StreamSplit.peek 1..10, 3
{[1, 2, 3], #Function<57.77324385/2 in Stream.transform/3>}
iex> Enum.take(enum, 5)
[1, 2, 3, 4, 5]

https://hex.pm/packages/stream_split

Answer 7

我可能会迟到，但可以通过 Stream.chunk_while/4 来完成，

defmodule Denis do
  def lookahead(enumerable) do
    chunk_fun = fn
      element, nil -> {:cont, element}
      element, acc -> {:cont, [acc, element], element}
    end

    after_fun = fn
      nil -> {:cont, []}
      [] -> {:cont, []}
      acc -> {:cont, [acc], []}
    end

    enumerable
    |> Stream.chunk_while(nil, chunk_fun, after_fun)
  end
end

可见/流向前看

7 个答案: