Question

我正在从事自然语言处理项目（并学习Elixir），并且无法找出转换数据的惯用方法。

为了不向您提供无用的域名详细信息，让我们将问题转移到解析地址。

给定一个字符串标记列表，使用相关标记就地组成数据结构，而将其他标记留在原来的位置：

# input
["in", "France:",  "22", "Rue", "du", "Débarcadère", ",", "75017", "Paris", ",", "France", "where", "they", "are"]
MyModule.process(tokens)

# output
["in", "France:",  %Address{
  street: "Rue du Débarcadère",
  street_number: 22,
  zip: 75017,
  city: "Paris",
  country: "France"
}, "where", "they", "are"]

# input
["in", "the", "USA:", "125", "Maiden", "Lane", ",", "11th", "Floor",
"New", "York", ",", "NY", "10038", "USA", "where", "they", "are"]

# output
["in", "the", "USA:",  %Address{
  street: "Maiden Lane",
  street_number: 125,
  floor: 11,
  zip: 10038,
  city: "New York",
  state: "NY",
  country: "USA"
}, "where", "they", "are"]

将一系列标记转换为Address结构将需要一些特定于国家/地区的逻辑（格式化地址的不同方式等），我们假设这些逻辑可用。此外，让我们假设我能够通过查看令牌（例如以“：”结尾的令牌）切换到适当的解析逻辑（即地址所在的国家）。

再一次，我想要实现的目标：

迭代令牌，直到有人触发特殊情况（国家名称后跟“：”）
使用所有相关令牌（在第一个示例中处理令牌从“22”到“法国”）
用结构（%Address{}）
继续迭代第一个未处理的令牌（“where”）

某种形式的reduce似乎是合适的，但reduce本身不会继续迭代我想要的地方，reduce_while似乎也不是故障单。

它应该没有什么区别，但我希望能够在更高级别应用相同的逻辑/过程并组成更高级别的数据结构，例如：

# input
["the", "Mirabeau", "restaurant", "at", %Address{...}, "where", "he", "cooked"]

# output
["the", %Place{
  name: "Mirabeau",
  type: :restaurant,
  location: %Address{...}
}, "where", "he", "cooked"]

Answer 1

您可以使用Stream.unfold/2。将所有令牌作为初始累加器传递，然后从函数中返回一个术语的元组和新的累加器。如果国家/地区名称后跟:，您可以根据需要使用更多的其他令牌，并返回其余的令牌。对于其他人，你可以简单地回头并继续尾巴。

这是一个很小的例子：

["in", "France:",  "22", "Rue", "du", "Débarcadère", ",", "75017",
 "Paris", ",", "France", "where", "they", "are", "in", "the", "USA:", "125",
 "Maiden", "Lane", ",", "11th", "Floor", "New", "York", ",", "NY", "10038",
 "USA", "where", "they", "are"]
|> Stream.unfold(fn
  [] -> nil
  [h | t] ->
    if String.ends_with?(h, ":") do
      {street, t} = Enum.split_while(t, &(&1 != ","))
      ["," | t] = t
      {rest, t} = Enum.split_while(t, &(&1 <> ":" != h))
      [country | t] = t
      {%{street: street, rest: rest, country: country}, t}
    else
      {h, t}
    end
end)
|> Enum.to_list
|> IO.inspect

输出：

["in",
 %{country: "France", rest: ["75017", "Paris", ","],
   street: ["22", "Rue", "du", "Débarcadère"]}, "where", "they", "are", "in",
 "the",
 %{country: "USA", rest: ["11th", "Floor", "New", "York", ",", "NY", "10038"],
   street: ["125", "Maiden", "Lane"]}, "where", "they", "are"]

在缩减/折叠期间编写数据结构

1 个答案: