Question

假设我有一个原始字符串和一个编码字符串，如下所示：

“abcd” - ＆gt; “0010111111001010”，那么一个可能的解决方案是“a”与“0010”匹配，“b”与“1111”匹配，“c”与“1100”匹配，“d”与“1010”匹配。

如何编写一个程序，给定这两个字符串，并找出可能的编码规则？

我的第一个划痕是这样的：

fun partition(orgl, encode) =
let
    val part = size(orgl)
    fun porpt(str, i, len) =
        if i = len - 1 then
            [substring(str, len * (len - 1), size(str) - (len - 1) * len)]
        else
            substring(str, len * i, len)::porpt(str, i + 1, len)
in
    porpt(encode, 0, part)
end;

但显然无法检查两个子串是否匹配相同的字符，除了按比例分区字符串之外还有许多其他可能性。

这个问题的适当算法应该是什么？

P.S。只允许使用前缀代码。

我所学到的还没有真正进入严肃的算法，但我做了一些关于回溯的搜索，并编写了我的第二版代码：

fun partition(orgl, encode) =
let
    val part = size(orgl)
    fun backtrack(str, s, len, count, code) =
        let
           val current =
               if count = 1 then
                  code@[substring(str, s, size(str) - s)]
               else
                  code@[substring(str, s, len)]
        in
           if len > size(str) - s then []
           else
              if proper_prefix(0, orgl, code) then
                  if count = 1 then current
                  else
                     backtrack(str, s + len, len, count - 1, current)
              else
                 backtrack(str, s, len + 1, count, code)
        end
 in
    backtrack(encode, 0, 1, part, [])
 end;

函数proper_prefix将检查前缀代码和唯一映射。但是，此功能无法正常工作。

例如，当我输入：

partition("abcd", "001111110101101");

返回的结果是：

uncaught exception Subscript

仅供参考，proper_prefix的正文如下所示：

fun proper_prefix(i, orgl, nil) = true
  | proper_prefix(i, orgl, x::xs) =
    let
      fun check(j, str, nil) = true
        | check(j, str, x::xs) =
          if String.isPrefix str x then
             if str = x andalso substring(orgl, i, 1) = substring(orgl, i + j + 1, 1) then
                check(j + 1, str, xs)
             else
                false
          else
             check(j + 1, str, xs)
    in
      if check(0, x, xs) then proper_prefix(i + 1, orgl, xs)
      else false
    end;

Answer 1

我尝试使用反向跟踪方法：

从空假设开始（即将所有编码设置为未知）。然后按字符处理编码的字符串。

在每个新代码字符处，您有两个选项：将代码字符附加到当前源字符的编码或转到下一个源字符。如果遇到已有编码的源字符，请检查它是否匹配并继续。或者如果它不匹配，请返回并尝试其他选项。您还可以在此遍历期间检查prefix-property。

您的示例输入可以按如下方式处理：

Assume 'a' == '0'
Go to next source character
Assume 'b' == '0'
Violation of prefix property, go back
Assume 'a' == '00'
Go to next source character
Assume 'b' == '1'
...

这探讨了所有可能编码的范围。您可以返回找到的第一个编码或所有可能的编码。

Answer 2

如果有人天真地迭代 abcd →0010111111001010的所有可能翻译，这可能会导致爆炸。简单的迭代似乎也会导致许多无效的翻译需要跳过：

(a, b, c, d) → (0, 0, 1, 0111111001010) is invalid because a = b
(a, b, c, d) → (0, 0, 10, 111111001010) is invalid because a = b
(a, b, c, d) → (0, 01, 0, 111111001010) is invalid because a = c
(a, b, c, d) → (00, 1, 0, 111111001010) is one possibility
(a, b, c, d) → (0, 0, 101, 11111001010) is invalid because a = b
(a, b, c, d) → (0, 010, 1, 11111001010) is another possibility
(a, b, c, d) → (001, 0, 1, 11111001010) is another possibility
(a, b, c, d) → (0, 01, 01, 11111001010) is invalid because b = c
(a, b, c, d) → (00, 1, 01, 11111001010) is another possibility
(a, b, c, d) → (00, 10, 1, 11111001010) is another possibility
...

如果所有字符串恰好包含每个字符一次，那么结果的爆炸就是答案。如果同一个字符出现多次，则会进一步限制解决方案。例如。匹配 abca →111011可以生成

(a, b, c, a) → (1, 1, 1, 011) is invalid because a = b = c, a ≠ a
(a, b, c, a) → (1, 1, 10, 11) is invalid because a = b, a ≠ a
(a, b, c, a) → (1, 11, 0, 11) is invalid because a = b, a ≠ a
(a, b, c, a) → (11, 1, 0, 11) is one possibility
... (all remaining combinations would eventually prove invalid)

对于给定的假设，您可以选择验证约束的顺序。任

查看是否有任何映射重叠。（我认为这就是Nico所说的前缀属性。）
查看位字符串中两个位置是否出现过多次出现的字符。

使用此搜索策略的算法必须找到检查约束的顺序，以便尽快尝试假设。我的直觉告诉我，如果位串β很长并且多次出现，那么约束 a →β值得研究更快。

另一种策略是排除某个特定字符可以映射到某个长度/高于/低于某个长度的任何位串。例如， aaab →1111110排除 a 映射到长度大于2的任何位串， abcab →1011101排除 a 映射到任何长度不同于2的位字符串。

对于编程部分，尝试并考虑表示假设的方法。 E.g。

(* For the hypothesis (a, b, c, a) → (11, 1, 0, 11) *)

(* Order signifies first occurrence *)
val someHyp1 = ([(#"a", 2), (#"b", 1), (#"c", 1)], "abca", "111011")

(* Somehow recurse over hypothesis and accumulate offsets for each character, e.g. *)
val someHyp2 = ([(#"a", 2), (#"b", 1), (#"c", 1)],
                [(#"a", 0), (#"b", 2), (#"c", 3), (#"a", 4)])

创建一个以某种顺序生成新假设的函数，以及一个查找假设是否有效的函数。

fun nextHypothesis (hyp, origStr, encStr) = ... (* should probably return SOME/NONE *)
fun validHypothesis (hyp, origStr, encStr) =
    allStr (fn (i, c) => (* is bit string for c at its
                            accumulated offset in encStr? *)) origStr

(* Helper function that checks whether a predicate is true for each
   character in a string. The predicate function takes both the index
   and the character as argument. *)
and allStr p s =
    let val len = size s
        fun loop i = i >= len orelse p (i, String.sub (s, i)) andalso loop (i+1)
    in loop 0 end

对此框架的改进将是改变探索假设的顺序，因为某些搜索路径可以排除比其他搜索路径更大量的无效映射。

给定原始字符串和编码字符串，如何诱导编码？

2 个答案: