在缩减/折叠期间编写数据结构

时间:2017-06-14 18:10:55

标签: functional-programming elixir

我正在从事自然语言处理项目(并学习Elixir),并且无法找出转换数据的惯用方法。

为了不向您提供无用的域名详细信息,让我们将问题转移到解析地址。

给定一个字符串标记列表,使用相关标记就地组成数据结构,而将其他标记留在原来的位置:

# input
["in", "France:",  "22", "Rue", "du", "Débarcadère", ",", "75017", "Paris", ",", "France", "where", "they", "are"]
MyModule.process(tokens)

# output
["in", "France:",  %Address{
  street: "Rue du Débarcadère",
  street_number: 22,
  zip: 75017,
  city: "Paris",
  country: "France"
}, "where", "they", "are"]

# input
["in", "the", "USA:", "125", "Maiden", "Lane", ",", "11th", "Floor",
"New", "York", ",", "NY", "10038", "USA", "where", "they", "are"]

# output
["in", "the", "USA:",  %Address{
  street: "Maiden Lane",
  street_number: 125,
  floor: 11,
  zip: 10038,
  city: "New York",
  state: "NY",
  country: "USA"
}, "where", "they", "are"]

将一系列标记转换为Address结构将需要一些特定于国家/地区的逻辑(格式化地址的不同方式等),我们假设这些逻辑可用。此外,让我们假设我能够通过查看令牌(例如以“:”结尾的令牌)切换到适当的解析逻辑(即地址所在的国家)。

再一次,我想要实现的目标:

  1. 迭代令牌,直到有人触发特殊情况(国家名称后跟“:”)
  2. 使用所有相关令牌(在第一个示例中处理令牌从“22”到“法国”)
  3. 用结构(%Address{}
  4. 替换它们
  5. 继续迭代第一个未处理的令牌(“where”)
  6. 某种形式的reduce似乎是合适的,但reduce本身不会继续迭代我想要的地方,reduce_while似乎也不是故障单。

    它应该没有什么区别,但我希望能够在更高级别应用相同的逻辑/过程并组成更高级别的数据结构,例如:

    # input
    ["the", "Mirabeau", "restaurant", "at", %Address{...}, "where", "he", "cooked"]
    
    # output
    ["the", %Place{
      name: "Mirabeau",
      type: :restaurant,
      location: %Address{...}
    }, "where", "he", "cooked"]
    

1 个答案:

答案 0 :(得分:2)

您可以使用Stream.unfold/2。将所有令牌作为初始累加器传递,然后从函数中返回一个术语的元组和新的累加器。如果国家/地区名称后跟:,您可以根据需要使用更多的其他令牌,并返回其余的令牌。对于其他人,你可以简单地回头并继续尾巴。

这是一个很小的例子:

["in", "France:",  "22", "Rue", "du", "Débarcadère", ",", "75017",
 "Paris", ",", "France", "where", "they", "are", "in", "the", "USA:", "125",
 "Maiden", "Lane", ",", "11th", "Floor", "New", "York", ",", "NY", "10038",
 "USA", "where", "they", "are"]
|> Stream.unfold(fn
  [] -> nil
  [h | t] ->
    if String.ends_with?(h, ":") do
      {street, t} = Enum.split_while(t, &(&1 != ","))
      ["," | t] = t
      {rest, t} = Enum.split_while(t, &(&1 <> ":" != h))
      [country | t] = t
      {%{street: street, rest: rest, country: country}, t}
    else
      {h, t}
    end
end)
|> Enum.to_list
|> IO.inspect

输出:

["in",
 %{country: "France", rest: ["75017", "Paris", ","],
   street: ["22", "Rue", "du", "Débarcadère"]}, "where", "they", "are", "in",
 "the",
 %{country: "USA", rest: ["11th", "Floor", "New", "York", ",", "NY", "10038"],
   street: ["125", "Maiden", "Lane"]}, "where", "they", "are"]