给定一个字符串以及匹配> = 3个字符的约束,如何确保结果字符串尽可能小?
使用gassa的显式进行编辑:
即
'AAAABBBAC'
如果我先移除B,
AAAA[BBB]AC -- > AAAAAC
,然后我可以从结果字符串中删除所有A,并留下:
[AAAAA]C --> C
'C'
如果我先删除可用的内容(A的序列),我会得到:
[AAAA]BBBAC -- > [BBB]AC --> AC
'AC'
答案 0 :(得分:4)
树肯定会为您提供最短的字符串。
树解决方案:
State
及其所有可移动子字符串定义string Input
(节点)' int[] Indexes
。int index
创建另一个State
并将其添加到父状态State[] Children
。State
没有子级Children = null
。State[]
的所有后代State
。按最短的string Input
订购。这就是你的答案。测试用例:
string result = FindShortest("AAAABBBAC"); // AC
string result2 = FindShortest("AABBAAAC"); // AABBC
string result3 = FindShortest("BAABCCCBBA"); // B
守则:
注意:当然,欢迎所有人在性能和/或修复任何错误方面增强以下代码。
class Program
{
static void Main(string[] args)
{
string result = FindShortest("AAAABBBAC"); // AC
string result2 = FindShortest("AABBAAAC"); // AABBC
string result3 = FindShortest("BAABCCCBBA"); // B
}
// finds the FIRST shortest string for a given input
private static string FindShortest(string input)
{
// all possible removable strings' indexes
// for this given input
int[] indexes = RemovableIndexes(input);
// each input string and its possible removables are a state
var state = new State { Input = input, Indexes = indexes };
// create the tree
GetChildren(state);
// get the FIRST shortest
// i.e. there would be more than one answer sometimes
// this could be easily changed to get all possible results
var result =
Descendants(state)
.Where(d => d.Children == null || d.Children.Length == 0)
.OrderBy(d => d.Input.Length)
.FirstOrDefault().Input;
return result;
}
// simple get all descendants of a node/state in a tree
private static IEnumerable<State> Descendants(State root)
{
var states = new Stack<State>(new[] { root });
while (states.Any())
{
State node = states.Pop();
yield return node;
if (node.Children != null)
foreach (var n in node.Children) states.Push(n);
}
}
// creates the tree
private static void GetChildren(State state)
{
// for each an index there is a child
state.Children = state.Indexes.Select(
i =>
{
var input = RemoveAllAt(state.Input, i);
return input.Length < state.Input.Length && input.Length > 0
? new State
{
Input = input,
Indexes = RemovableIndexes(input)
}
: null;
}).ToArray();
foreach (var c in state.Children)
GetChildren(c);
}
// find all possible removable strings' indexes
private static int[] RemovableIndexes(string input)
{
var indexes = new List<int>();
char d = input[0];
int count = 1;
for (int i = 1; i < input.Length; i++)
{
if (d == input[i])
count++;
else
{
if (count >= 3)
indexes.Add(i - count);
// reset
d = input[i];
count = 1;
}
}
if (count >= 3)
indexes.Add(input.Length - count);
return indexes.ToArray();
}
// remove all duplicate chars starting from an index
private static string RemoveAllAt(string input, int startIndex)
{
string part1, part2;
int endIndex = startIndex + 1;
int i = endIndex;
for (; i < input.Length; i++)
if (input[i] != input[startIndex])
{
endIndex = i;
break;
}
if (i == input.Length && input[i - 1] == input[startIndex])
endIndex = input.Length;
part1 = startIndex > 0 ? input.Substring(0, startIndex) : string.Empty;
part2 = endIndex <= (input.Length - 1) ? input.Substring(endIndex) : string.Empty;
return part1 + part2;
}
// our node, which is
// an input string &
// all possible removable strings' indexes
// & its children
public class State
{
public string Input;
public int[] Indexes;
public State[] Children;
}
}
答案 1 :(得分:3)
显然,我们不关心任何超过2个字符的重复字符块。并且只有一种方法可以组合两个相同字符的块,其中至少一个块的长度小于3可以组合 - 即,如果它们之间的序列可以被移除。
所以(1)查看相同字符的块对,其中至少一个长度小于3,(2)确定它们之间的序列是否可以被删除。
我们想要决定加入哪些对,以便最小化长度小于3个字符的块的总长度。 (请注意,对的数量受字母表大小(和分布)的约束。)
让f(b)
表示剩余的块b
中长度小于3个字符的相同字符块的最小总长度。然后:
f(b):
p1 <- previous block of the same character
if b and p1 can combine:
if b.length + p1.length > 2:
f(b) = min(
// don't combine
(0 if b.length > 2 else b.length) +
f(block before b),
// combine
f(block before p1)
)
// b.length + p1.length < 3
else:
p2 <- block previous to p1 of the same character
if p1 and p2 can combine:
f(b) = min(
// don't combine
b.length + f(block before b),
// combine
f(block before p2)
)
else:
f(b) = b.length + f(block before b)
// b and p1 cannot combine
else:
f(b) = b.length + f(block before b)
for all p1 before b
问题是我们如何才能有效地确定一个块是否可以与同一个字符的前一个块组合(除了明显的递归到两个块之间的子块列表中)。
Python代码:
import random
import time
def parse(length):
return length if length < 3 else 0
def f(string):
chars = {}
blocks = [[string[0], 1, 0]]
chars[string[0]] = {'indexes': [0]}
chars[string[0]][0] = {'prev': -1}
p = 0 # pointer to current block
for i in xrange(1, len(string)):
if blocks[len(blocks) - 1][0] == string[i]:
blocks[len(blocks) - 1][1] += 1
else:
p += 1
# [char, length, index, f(i), temp]
blocks.append([string[i], 1, p])
if string[i] in chars:
chars[string[i]][p] = {'prev': chars[string[i]]['indexes'][ len(chars[string[i]]['indexes']) - 1 ]}
chars[string[i]]['indexes'].append(p)
else:
chars[string[i]] = {'indexes': [p]}
chars[string[i]][p] = {'prev': -1}
#print blocks
#print
#print chars
#print
memo = [[None for j in xrange(len(blocks))] for i in xrange(len(blocks))]
def g(l, r, top_level=False):
####
####
#print "(l, r): (%s, %s)" % (l,r)
if l == r:
return parse(blocks[l][1])
if memo[l][r]:
return memo[l][r]
result = [parse(blocks[l][1])] + [None for k in xrange(r - l)]
if l < r:
for i in xrange(l + 1, r + 1):
result[i - l] = parse(blocks[i][1]) + result[i - l - 1]
for i in xrange(l, r + 1):
####
####
#print "\ni: %s" % i
[char, length, index] = blocks[i]
#p1 <- previous block of the same character
p1_idx = chars[char][index]['prev']
####
####
#print "(p1_idx, l, p1_idx >= l): (%s, %s, %s)" % (p1_idx, l, p1_idx >= l)
if p1_idx < l and index > l:
result[index - l] = parse(length) + result[index - l - 1]
while p1_idx >= l:
p1 = blocks[p1_idx]
####
####
#print "(b, p1, p1_idx, l): (%s, %s, %s, %s)\n" % (blocks[i], p1, p1_idx, l)
between = g(p1[2] + 1, index - 1)
####
####
#print "between: %s" % between
#if b and p1 can combine:
if between == 0:
if length + p1[1] > 2:
result[index - l] = min(
result[index - l],
# don't combine
parse(length) + (result[index - l - 1] if index - l > 0 else 0),
# combine: f(block before p1)
result[p1[2] - l - 1] if p1[2] > l else 0
)
# b.length + p1.length < 3
else:
#p2 <- block previous to p1 of the same character
p2_idx = chars[char][p1[2]]['prev']
if p2_idx < l:
p1_idx = chars[char][p1_idx]['prev']
continue
between2 = g(p2_idx + 1, p1[2] - 1)
#if p1 and p2 can combine:
if between2 == 0:
result[index - l] = min(
result[index - l],
# don't combine
parse(length) + (result[index - l - 1] if index - l > 0 else 0),
# combine the block, p1 and p2
result[p2_idx - l - 1] if p2_idx - l > 0 else 0
)
else:
#f(b) = b.length + f(block before b)
result[index - l] = min(
result[index - l],
parse(length) + (result[index - l - 1] if index - l > 0 else 0)
)
# b and p1 cannot combine
else:
#f(b) = b.length + f(block before b)
result[index - l] = min(
result[index - l],
parse(length) + (result[index - l - 1] if index - l > 0 else 0)
)
p1_idx = chars[char][p1_idx]['prev']
#print l,r,result
memo[l][r] = result[r - l]
"""if top_level:
return (result, blocks)
else:"""
return result[r - l]
if len(blocks) == 1:
return ([parse(blocks[0][1])], blocks)
else:
return g(0, len(blocks) - 1, True)
"""s = ""
for i in xrange(300):
s = s + ['A','B','C'][random.randint(0,2)]"""
print f("abcccbcccbacccab") # b
print
print f("AAAABBBAC"); # C
print
print f("CAAAABBBA"); # C
print
print f("AABBAAAC"); # AABBC
print
print f("BAABCCCBBA"); # B
print
print f("aaaa")
print
使用jdehesa&#39; s answer计算这些较长示例的字符串答案:
t0 = time.time()
print f("BCBCCBCCBCABBACCBABAABBBABBBACCBBBAABBACBCCCACABBCAABACBBBBCCCBBAACBAABACCBBCBBAABCCCCCAABBBBACBBAAACACCBCCBBBCCCCCCCACBABACCABBCBBBBBCBABABBACCAACBCBBAACBBBBBCCBABACBBABABAAABCCBBBAACBCACBAABAAAABABB")
# BCBCCBCCBCABBACCBABCCAABBACBACABBCAABACAACBAABACCBBCBBCACCBACBABACCABBCCBABABBACCAACBCBBAABABACBBABABBCCAACBCACBAABBABB
t1 = time.time()
total = t1-t0
print total
t0 = time.time()
print f("CBBACAAAAABBBBCAABBCBAABBBCBCBCACACBAABCBACBBABCABACCCCBACBCBBCBACBBACCCBAAAACACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBCCCABACABBCABBAAAAABBBBAABAABBCACACABBCBCBCACCCBABCAACBCAAAABCBCABACBABCABCBBBBABCBACABABABCCCBBCCBBCCBAAABCABBAAABBCAAABCCBAABAABCAACCCABBCAABCBCBCBBAACCBBBACBBBCABAABCABABABABCA")
# CBBACCAABBCBAACBCBCACACBAABCBACBBABCABABACBCBBCBACBBABCACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBABACABBCBBCACACABBCBCBCABABCAACBCBCBCABACBABCABCABCBACABABACCBBCCBBCACBCCBAABAABCBBCAABCBCBCBBAACCACCABAABCABABABABCA
t1 = time.time()
total = t1-t0
print total
t0 = time.time()
print f("AADBDBEBBBBCABCEBCDBBBBABABDCCBCEBABADDCABEEECCECCCADDACCEEAAACCABBECBAEDCEEBDDDBAAAECCBBCEECBAEBEEEECBEEBDACDDABEEABEEEECBABEDDABCDECDAABDAEADEECECEBCBDDAEEECCEEACCBBEACDDDDBDBCCAAECBEDAAAADBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDAEEEBBBCEDECBCABDEDEBBBABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCECCCA")
# AADBDBECABCEBCDABABDCCBCEBABADDCABCCEADDACCEECCABBECBAEDCEEBBECCBBCEECBAEBCBEEBDACDDABEEABCBABEDDABCDECDAABDAEADEECECEBCBDDACCEEACCBBEACBDBCCAAECBEDDBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDACEDECBCABDEDEABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCEA
t1 = time.time()
total = t1-t0
print total
答案 2 :(得分:2)
我建议使用动态编程的O(n ^ 2)解决方案。
让我们介绍一下符号。由P [1]和S [l]表示的字符串A的长度l的前缀和后缀。我们称之为程序Rcd。
请注意,RHS中的外部Rcd是微不足道的。所以,这是我们的最佳子结构。基于此,我想出了以下实现:
#include <iostream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;
string remdupright(string s, bool allowEmpty) {
if (s.size() >= 3) {
auto pos = s.find_last_not_of(s.back());
if (pos == string::npos && allowEmpty) s = "";
else if (pos != string::npos && s.size() - pos > 3) s = s.substr(0, pos + 1);
}
return s;
}
string remdupleft(string s, bool allowEmpty) {
if (s.size() >= 3) {
auto pos = s.find_first_not_of(s.front());
if (pos == string::npos && allowEmpty) s = "";
else if (pos != string::npos && pos >= 3) s = s.substr(pos);
}
return s;
}
string remdup(string s, bool allowEmpty) {
return remdupleft(remdupright(s, allowEmpty), allowEmpty);
}
string run(const string in) {
vector<vector<string>> table(in.size());
for (int i = 0; i < (int)table.size(); ++i) {
table[i].resize(in.size() - i);
}
for (int i = 0; i < (int)table[0].size(); ++i) {
table[0][i] = in.substr(i,1);
}
for (int len = 2; len <= (int)table.size(); ++len) {
for (int pos = 0; pos < (int)in.size() - len + 1; ++pos) {
string base(table[len - 2][pos]);
const char suffix = in[pos + len - 1];
if (base.size() && suffix != base.back()) {
base = remdupright(base, false);
}
const string opt1 = base + suffix;
base = table[len - 2][pos+1];
const char prefix = in[pos];
if (base.size() && prefix != base.front()) {
base = remdupleft(base, false);
}
const string opt2 = prefix + base;
const string nodupopt1 = remdup(opt1, true);
const string nodupopt2 = remdup(opt2, true);
table[len - 1][pos] = nodupopt1.size() > nodupopt2.size() ? opt2 : opt1;
assert(nodupopt1.size() != nodupopt2.size() || nodupopt1 == nodupopt2);
}
}
string& res = table[in.size() - 1][0];
return remdup(res, true);
}
void testRcd(string s, string expected) {
cout << s << " : " << run(s) << ", expected: " << expected << endl;
}
int main()
{
testRcd("BAABCCCBBA", "B");
testRcd("AABBAAAC", "AABBC");
testRcd("AAAA", "");
testRcd("AAAABBBAC", "C");
}
您可以检查默认值并运行测试here。
答案 3 :(得分:1)
这是一个Python解决方案(函数reduce_min
),不是特别聪明,但我认为相当容易理解(为了清晰答案而添加了过多的注释):
def reductions(s, min_len):
"""
Yields every possible reduction of s by eliminating contiguous blocks
of l or more repeated characters.
For example, reductions('AAABBCCCCBAAC', 3) yields
'BBCCCCBAAC' and 'AAABBBAAC'.
"""
# Current character
curr = ''
# Length of current block
n = 0
# Start position of current block
idx = 0
# For each character
for i, c in enumerate(s):
if c != curr:
# New block begins
if n >= min_len:
# If previous block was long enough
# yield reduced string without it
yield s[:idx] + s[i:]
# Start new block
curr = c
n = 1
idx = i
else:
# Still in the same block
n += 1
# Yield reduction without last block if it was long enough
if n >= min_len:
yield s[:idx]
def reduce_min(s, min_len):
"""
Finds the smallest possible reduction of s by successive
elimination of contiguous blocks of min_len or more repeated
characters.
"""
# Current set of possible reductions
rs = set([s])
# Current best solution
result = s
# While there are strings to reduce
while rs:
# Get one element
r = rs.pop()
# Find reductions
r_red = list(reductions(r, min_len))
# If no reductions are found it is irreducible
if len(r_red) == 0 and len(r) < len(result):
# Replace if shorter than current best
result = r
else:
# Save reductions for next iterations
rs.update(r_red)
return result
assert reduce_min("BAABCCCBBA", 3) == "B"
assert reduce_min("AABBAAAC", 3) == "AABBC"
assert reduce_min("AAAA", 3) == ""
assert reduce_min("AAAABBBAC", 3) == "C"
编辑:由于人们似乎在发布C ++解决方案,所以这是我的C ++(同样,函数reduce_min
):
#include <string>
#include <vector>
#include <unordered_set>
#include <iterator>
#include <utility>
#include <cassert>
using namespace std;
void reductions(const string &s, unsigned int min_len, vector<string> &rs)
{
char curr = '\0';
unsigned int n = 0;
unsigned int idx = 0;
for (auto it = s.begin(); it != s.end(); ++it)
{
if (curr != *it)
{
auto i = distance(s.begin(), it);
if (n >= min_len)
{
rs.push_back(s.substr(0, idx) + s.substr(i));
}
curr = *it;
n = 1;
idx = i;
}
else
{
n += 1;
}
}
if (n >= min_len)
{
rs.push_back(s.substr(0, idx));
}
}
string reduce_min(const string &s, unsigned int min_len)
{
unordered_set<string> rs { s };
string result = s;
vector<string> rs_new;
while (!rs.empty())
{
auto it = rs.begin();
auto r = *it;
rs.erase(it);
rs_new.clear();
reductions(r, min_len, rs_new);
if (rs_new.empty() && r.size() < result.size())
{
result = move(r);
}
else
{
rs.insert(rs_new.begin(), rs_new.end());
}
}
return result;
}
int main(int argc, char **argv)
{
assert(reduce_min("BAABCCCBBA", 3) == "B");
assert(reduce_min("AABBAAAC", 3) == "AABBC");
assert(reduce_min("AAAA", 3) == "");
assert(reduce_min("AAAABBBAC", 3) == "C");
return 0;
}
如果您可以使用C ++ 17,则可以使用string views来保存内存。
编辑2:关于算法的复杂性。弄清楚并不是直截了当的,正如我所说的那样算法比任何事情都简单,但让我们看看。最后,它与广度优先搜索大致相同。假设字符串长度为n
,并且,为了一般性,假设最小块长度(问题中的值3)为m
。在第一级,我们可以在最坏的情况下减少n / m
次减少。对于其中的每一项,我们最多可以生成(n - m) / m
次减少,依此类推。所以基本上,在“级别”i
(循环迭代i
),我们为每个字符串创建最多(n - i * m) / m
个减少,并且每个字符串都需要O(n - i * m)
时间来处理。我们可以拥有的最大级别数是n / m
。所以算法的复杂性(如果我没有犯错)应该有以下形式:
O( sum {i = 0 .. n / m} ( O(n - i * m) * prod {j = 0 .. i} ((n - i * m) / m) ))
|-Outer iters--| |---Cost---| |-Prev lvl-| |---Branching---|
呼。所以这应该是这样的:
O( sum {i = 0 .. n / m} (n - i * m) * O(n^i / m^i) )
反过来会崩溃到:
O((n / m)^(n / m))
所以是的,算法或多或少都很简单,但它可以遇到指数成本的情况(坏的情况是完全由m
制成的字符串 - 长块,如AAABBBCCCAAACCC...
for { {1}} = 3)。
答案 4 :(得分:1)
另一个scala答案,使用memoization和tailcall优化(部分)(更新)。
import scala.collection.mutable.HashSet
import scala.annotation._
object StringCondense extends App {
@tailrec
def groupConsecutive (s: String, sofar: List[String]): List[String] = s.toList match {
// def groupConsecutive (s: String): List[String] = s.toList match {
case Nil => sofar
// case Nil => Nil
case c :: str => {
val (prefix, rest) = (c :: str).span (_ == c)
// Strings of equal characters, longer than 3, don't make a difference to just 3
groupConsecutive (rest.mkString(""), (prefix.take (3)).mkString ("") :: sofar)
// (prefix.take (3)).mkString ("") :: groupConsecutive (rest.mkString(""))
}
}
// to count the effect of memoization
var count = 0
// recursively try to eliminate every group of 3 or more, brute forcing
// but for "aabbaabbaaabbbaabb", many reductions will lead sooner or
// later to the same result, so we try to detect these and avoid duplicate
// work
def moreThan2consecutive (s: String, seenbefore: HashSet [String]): String = {
if (seenbefore.contains (s)) s
else
{
count += 1
seenbefore += s
val sublists = groupConsecutive (s, Nil)
// val sublists = groupConsecutive (s)
val atLeast3 = sublists.filter (_.size > 2)
atLeast3.length match {
case 0 => s
case 1 => {
val res = sublists.filter (_.size < 3)
moreThan2consecutive (res.mkString (""), seenbefore)
}
case _ => {
val shrinked = (
for {idx <- (0 until sublists.size)
if (sublists (idx).length >= 3)
pre = (sublists.take (idx)).mkString ("")
post= (sublists.drop (idx+1)).mkString ("")
} yield {
moreThan2consecutive (pre + post, seenbefore)
}
)
(shrinked.head /: shrinked.tail) ((a, b) => if (a.length <= b.length) a else b)
}
}
}
}
// don't know what Rcd means, adopted from other solution but modified
// kind of a unit test **update**: forgot to reset count
testRcd (s: String, expected: String) : Boolean = {
count = 0
val seenbefore = HashSet [String] ()
val result = moreThan2consecutive (s, seenbefore)
val hit = result.equals (expected)
println (s"Input: $s\t result: ${result}\t expected ${expected}\t $hit\t count: $count");
hit
}
// some test values from other users with expected result
// **upd:** more testcases
def testgroup () : Unit = {
testRcd ("baabcccbba", "b")
testRcd ("aabbaaac", "aabbc")
testRcd ("aaaa", "")
testRcd ("aaaabbbac", "c")
testRcd ("abcccbcccbacccab", "b")
testRcd ("AAAABBBAC", "C")
testRcd ("CAAAABBBA", "C")
testRcd ("AABBAAAC", "AABBC")
testRcd ("BAABCCCBBA", "B")
testRcd ("AAABBBAAABBBAAABBBC", "C") // 377 subcalls reported by Yola,
testRcd ("AAABBBAAABBBAAABBBAAABBBC", "C") // 4913 when preceeded with AAABBB
}
testgroup
def testBigs () : Unit = {
/*
testRcd ("BCBCCBCCBCABBACCBABAABBBABBBACCBBBAABBACBCCCACABBCAABACBBBBCCCBBAACBAABACCBBCBBAABCCCCCAABBBBACBBAAACACCBCCBBBCCCCCCCACBABACCABBCBBBBBCBABABBACCAACBCBBAACBBBBBCCBABACBBABABAAABCCBBBAACBCACBAABAAAABABB",
"BCBCCBCCBCABBACCBABCCAABBACBACABBCAABACAACBAABACCBBCBBCACCBACBABACCABBCCBABABBACCAACBCBBAABABACBBABABBCCAACBCACBAABBABB")
*/
testRcd ("CBBACAAAAABBBBCAABBCBAABBBCBCBCACACBAABCBACBBABCABACCCCBACBCBBCBACBBACCCBAAAACACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBCCCABACABBCABBAAAAABBBBAABAABBCACACABBCBCBCACCCBABCAACBCAAAABCBCABACBABCABCBBBBABCBACABABABCCCBBCCBBCCBAAABCABBAAABBCAAABCCBAABAABCAACCCABBCAABCBCBCBBAACCBBBACBBBCABAABCABABABABCA",
"CBBACCAABBCBAACBCBCACACBAABCBACBBABCABABACBCBBCBACBBABCACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBABACABBCBBCACACABBCBCBCABABCAACBCBCBCABACBABCABCABCBACABABACCBBCCBBCACBCCBAABAABCBBCAABCBCBCBBAACCACCABAABCABABABABCA")
/*testRcd ("AADBDBEBBBBCABCEBCDBBBBABABDCCBCEBABADDCABEEECCECCCADDACCEEAAACCABBECBAEDCEEBDDDBAAAECCBBCEECBAEBEEEECBEEBDACDDABEEABEEEECBABEDDABCDECDAABDAEADEECECEBCBDDAEEECCEEACCBBEACDDDDBDBCCAAECBEDAAAADBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDAEEEBBBCEDECBCABDEDEBBBABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCECCCA",
"AADBDBECABCEBCDABABDCCBCEBABADDCABCCEADDACCEECCABBECBAEDCEEBBECCBBCEECBAEBCBEEBDACDDABEEABCBABEDDABCDECDAABDAEADEECECEBCBDDACCEEACCBBEACBDBCCAAECBEDDBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDACEDECBCABDEDEABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCEA")
*/
}
// for generated input, but with fixed seed, to compare the count with
// and without memoization
import util.Random
val r = new Random (31415)
// generate Strings but with high chances to produce some triples and
// longer sequences of char clones
def genRandomString () : String = {
(1 to 20).map (_ => r.nextInt (6) match {
case 0 => "t"
case 1 => "r"
case 2 => "-"
case 3 => "tt"
case 4 => "rr"
case 5 => "--"
}).mkString ("")
}
def testRandom () : Unit = {
(1 to 10).map (i=> testRcd (genRandomString, "random mode - false might be true"))
}
testRandom
testgroup
testRandom
// testBigs
}
比较记忆的效果导致有趣的结果:
更新衡量标准。在旧的价值观中,我忘了重置计数器,这导致更高的结果。现在结果传播 更令人印象深刻,总的来说,价值更小。
No seenbefore:
Input: baabcccbba result: b expected b true count: 4
Input: aabbaaac result: aabbc expected aabbc true count: 2
Input: aaaa result: expected true count: 2
Input: aaaabbbac result: c expected c true count: 5
Input: abcccbcccbacccab result: b expected b true count: 34
Input: AAAABBBAC result: C expected C true count: 5
Input: CAAAABBBA result: C expected C true count: 5
Input: AABBAAAC result: AABBC expected AABBC true count: 2
Input: BAABCCCBBA result: B expected B true count: 4
Input: AAABBBAAABBBAAABBBC res: C expected C true count: 377
Input: AAABBBAAABBBAAABBBAAABBBC r: C expected C true count: 4913
Input: r--t----ttrrrrrr--tttrtttt--rr----result: rr--rr expected ? unknown ? false count: 1959
Input: ttrtt----tr---rrrtttttttrtr--rr result: r--rr expected ? unknown ? false count: 213
Input: tt----r-----ttrr----ttrr-rr--rr-- result: ttrttrrttrr-rr--rr-- ex ? unknown ? false count: 16
Input: --rr---rrrrrrr-r--rr-r--tt--rrrrr result: rr-r--tt-- expected ? unknown ? false count: 32
Input: tt-rrrrr--r--tt--rrtrrr------- result: ttr--tt--rrt expected ? unknown ? false count: 35
Input: --t-ttt-ttt--rrrrrt-rrtrttrr result: --tt-rrtrttrr expected ? unknown ? false count: 35
Input: rrt--rrrr----trrr-rttttrrtttrr result: rrtt- expected ? unknown ? false count: 1310
Input: ---tttrrrrrttrrttrr---tt-----tt result: rrttrr expected ? unknown ? false count: 1011
Input: -rrtt--rrtt---t-r--r---rttr-- result: -rrtt--rr-r--rrttr-- ex ? unknown ? false count: 9
Input: rtttt--rrrrrrrt-rrttt--tt--t result: r--t-rr--tt--t expectd ? unknown ? false count: 16
real 0m0.607s (without testBigs)
user 0m1.276s
sys 0m0.056s
With seenbefore:
Input: baabcccbba result: b expected b true count: 4
Input: aabbaaac result: aabbc expected aabbc true count: 2
Input: aaaa result: expected true count: 2
Input: aaaabbbac result: c expected c true count: 5
Input: abcccbcccbacccab result: b expected b true count: 11
Input: AAAABBBAC result: C expected C true count: 5
Input: CAAAABBBA result: C expected C true count: 5
Input: AABBAAAC result: AABBC expected AABBC true count: 2
Input: BAABCCCBBA result: B expected B true count: 4
Input: AAABBBAAABBBAAABBBC rest: C expected C true count: 28
Input: AAABBBAAABBBAAABBBAAABBBC C expected C true count: 52
Input: r--t----ttrrrrrr--tttrtttt--rr----result: rr--rr expected ? unknown ? false count: 63
Input: ttrtt----tr---rrrtttttttrtr--rr result: r--rr expected ? unknown ? false count: 48
Input: tt----r-----ttrr----ttrr-rr--rr-- result: ttrttrrttrr-rr--rr-- xpe? unknown ? false count: 8
Input: --rr---rrrrrrr-r--rr-r--tt--rrrrr result: rr-r--tt-- expected ? unknown ? false count: 19
Input: tt-rrrrr--r--tt--rrtrrr------- result: ttr--tt--rrt expected ? unknown ? false count: 12
Input: --t-ttt-ttt--rrrrrt-rrtrttrr result: --tt-rrtrttrr expected ? unknown ? false count: 16
Input: rrt--rrrr----trrr-rttttrrtttrr result: rrtt- expected ? unknown ? false count: 133
Input: ---tttrrrrrttrrttrr---tt-----tt result: rrttrr expected ? unknown ? false count: 89
Input: -rrtt--rrtt---t-r--r---rttr-- result: -rrtt--rr-r--rrttr-- ex ? unknown ? false count: 6
Input: rtttt--rrrrrrrt-rrttt--tt--t result: r--t-rr--tt--t expected ? unknown ? false count: 8
real 0m0.474s (without testBigs)
user 0m0.852s
sys 0m0.060s
With tailcall:
real 0m0.478s (without testBigs)
user 0m0.860s
sys 0m0.060s
对于一些随机字符串,差异大于10倍。
对于包含许多组的长字符串,作为改进,可以消除所有组中唯一的组,例如:
aa bbb aa ccc xx ddd aa eee aa fff xx
bbb,ccc,ddd,eee和fff这些组在字符串中是唯一的,所以它们不适合其他东西并且都可以被删除,并且删除顺序无关紧要。这将导致中间结果
aaaa xx aaaa xx
快速解决方案。也许我也尝试实现它。但是,我猜,有可能产生随机字符串,这会产生很大的影响,并且会产生一种不同形式的随机生成字符串,而影响很小的分布。