假设我有一个原始字符串和一个编码字符串,如下所示:
“abcd” - > “0010111111001010”,那么一个可能的解决方案是“a”与“0010”匹配,“b”与“1111”匹配,“c”与“1100”匹配,“d”与“1010”匹配。
如何编写一个程序,给定这两个字符串,并找出可能的编码规则?
我的第一个划痕是这样的:
fun partition(orgl, encode) =
let
val part = size(orgl)
fun porpt(str, i, len) =
if i = len - 1 then
[substring(str, len * (len - 1), size(str) - (len - 1) * len)]
else
substring(str, len * i, len)::porpt(str, i + 1, len)
in
porpt(encode, 0, part)
end;
但显然无法检查两个子串是否匹配相同的字符,除了按比例分区字符串之外还有许多其他可能性。
这个问题的适当算法应该是什么?
P.S。只允许使用前缀代码。
我所学到的还没有真正进入严肃的算法,但我做了一些关于回溯的搜索,并编写了我的第二版代码:
fun partition(orgl, encode) =
let
val part = size(orgl)
fun backtrack(str, s, len, count, code) =
let
val current =
if count = 1 then
code@[substring(str, s, size(str) - s)]
else
code@[substring(str, s, len)]
in
if len > size(str) - s then []
else
if proper_prefix(0, orgl, code) then
if count = 1 then current
else
backtrack(str, s + len, len, count - 1, current)
else
backtrack(str, s, len + 1, count, code)
end
in
backtrack(encode, 0, 1, part, [])
end;
函数proper_prefix将检查前缀代码和唯一映射。但是,此功能无法正常工作。
例如,当我输入:
partition("abcd", "001111110101101");
返回的结果是:
uncaught exception Subscript
仅供参考,proper_prefix的正文如下所示:
fun proper_prefix(i, orgl, nil) = true
| proper_prefix(i, orgl, x::xs) =
let
fun check(j, str, nil) = true
| check(j, str, x::xs) =
if String.isPrefix str x then
if str = x andalso substring(orgl, i, 1) = substring(orgl, i + j + 1, 1) then
check(j + 1, str, xs)
else
false
else
check(j + 1, str, xs)
in
if check(0, x, xs) then proper_prefix(i + 1, orgl, xs)
else false
end;
答案 0 :(得分:3)
我尝试使用反向跟踪方法:
从空假设开始(即将所有编码设置为未知)。然后按字符处理编码的字符串。
在每个新代码字符处,您有两个选项:将代码字符附加到当前源字符的编码或转到下一个源字符。如果遇到已有编码的源字符,请检查它是否匹配并继续。或者如果它不匹配,请返回并尝试其他选项。您还可以在此遍历期间检查prefix-property。
您的示例输入可以按如下方式处理:
Assume 'a' == '0'
Go to next source character
Assume 'b' == '0'
Violation of prefix property, go back
Assume 'a' == '00'
Go to next source character
Assume 'b' == '1'
...
这探讨了所有可能编码的范围。您可以返回找到的第一个编码或所有可能的编码。
答案 1 :(得分:1)
如果有人天真地迭代 abcd →0010111111001010的所有可能翻译,这可能会导致爆炸。简单的迭代似乎也会导致许多无效的翻译需要跳过:
(a, b, c, d) → (0, 0, 1, 0111111001010) is invalid because a = b
(a, b, c, d) → (0, 0, 10, 111111001010) is invalid because a = b
(a, b, c, d) → (0, 01, 0, 111111001010) is invalid because a = c
(a, b, c, d) → (00, 1, 0, 111111001010) is one possibility
(a, b, c, d) → (0, 0, 101, 11111001010) is invalid because a = b
(a, b, c, d) → (0, 010, 1, 11111001010) is another possibility
(a, b, c, d) → (001, 0, 1, 11111001010) is another possibility
(a, b, c, d) → (0, 01, 01, 11111001010) is invalid because b = c
(a, b, c, d) → (00, 1, 01, 11111001010) is another possibility
(a, b, c, d) → (00, 10, 1, 11111001010) is another possibility
...
如果所有字符串恰好包含每个字符一次,那么结果的爆炸就是答案。如果同一个字符出现多次,则会进一步限制解决方案。例如。匹配 abca →111011可以生成
(a, b, c, a) → (1, 1, 1, 011) is invalid because a = b = c, a ≠ a
(a, b, c, a) → (1, 1, 10, 11) is invalid because a = b, a ≠ a
(a, b, c, a) → (1, 11, 0, 11) is invalid because a = b, a ≠ a
(a, b, c, a) → (11, 1, 0, 11) is one possibility
... (all remaining combinations would eventually prove invalid)
对于给定的假设,您可以选择验证约束的顺序。任
使用此搜索策略的算法必须找到检查约束的顺序,以便尽快尝试假设。我的直觉告诉我,如果位串β很长并且多次出现,那么约束 a →β值得研究更快。
另一种策略是排除某个特定字符可以映射到某个长度/高于/低于某个长度的任何位串。例如, aaab →1111110排除 a 映射到长度大于2的任何位串, abcab →1011101排除 a 映射到任何长度不同于2的位字符串。
对于编程部分,尝试并考虑表示假设的方法。 E.g。
(* For the hypothesis (a, b, c, a) → (11, 1, 0, 11) *)
(* Order signifies first occurrence *)
val someHyp1 = ([(#"a", 2), (#"b", 1), (#"c", 1)], "abca", "111011")
(* Somehow recurse over hypothesis and accumulate offsets for each character, e.g. *)
val someHyp2 = ([(#"a", 2), (#"b", 1), (#"c", 1)],
[(#"a", 0), (#"b", 2), (#"c", 3), (#"a", 4)])
创建一个以某种顺序生成新假设的函数,以及一个查找假设是否有效的函数。
fun nextHypothesis (hyp, origStr, encStr) = ... (* should probably return SOME/NONE *)
fun validHypothesis (hyp, origStr, encStr) =
allStr (fn (i, c) => (* is bit string for c at its
accumulated offset in encStr? *)) origStr
(* Helper function that checks whether a predicate is true for each
character in a string. The predicate function takes both the index
and the character as argument. *)
and allStr p s =
let val len = size s
fun loop i = i >= len orelse p (i, String.sub (s, i)) andalso loop (i+1)
in loop 0 end
对此框架的改进将是改变探索假设的顺序,因为某些搜索路径可以排除比其他搜索路径更大量的无效映射。