我的主要问题是尝试找到一个合适的解决方案来自动转换它,例如:
d+c+d+f+d+c+d+f+d+c+d+f+d+c+d+f+
进入这个:
[d+c+d+f+]4
即。找到彼此相邻的重复项,然后从这些重复项中缩短“循环”。 到目前为止,我找不到合适的解决方案,我期待着回应。附:为了避免混淆,前面提到的样本并不是唯一需要“循环”的东西,它因文件而异。哦,这是用于C ++或C#程序,要么很好,尽管我也接受任何其他建议。此外,主要思想是所有工作都将由程序本身完成,除了文件本身之外没有用户输入。 这是完整的文件,供参考,我对拉伸页面表示歉意: #0 @ 16 v225 y10 w250 t76
L16 $ ED $ EF $ A9 p20,20 > ecegb> d< bgbgecgec<克 > d +&LT b取代; d + F + A +&以及c +< A + F + A + F + d + LT b取代; F + d + LT; BF + &以及c&LT a取代; cegbgegec&LT a取代; EC< AE > d + C + d + F + d + C + d + F + d + C + d + F + d + C + d + F + R1 ^ 1
/ L8 r1r1r1r1 F +< A +> F + G + CG + R4 A + C + A + G + CG + R4F + LT; A +> F + G + CG + R4 A + C + A + G + CG + R4F + LT; A +> F + G + CG + R4 A + C + A + G + CG + R4 F +< A +> F + G + CG + R4 A + C + A + G + r4g + 16f16c + 一个+ 2 ^ G + F + G + 4 F + FF + 4FD + F4 d + C + d + 4C +℃下A + 2 ^ 4 > C4D + < G + 2 ^ 4R4 ^ 一个+以及c + d + 4G + 4A + 4 R1 ^ 2 ^ 4 ^一个+ 2 ^ G + F + G + 4 F + FF + 4FD + F4 d + C + d + 4C +℃下A + 2 ^ 4 > C4D + < G + 2 ^ 4R4 ^ 一个+以及c + d + 4G + 4A + 4 R1 ^ 2 ^ 4 ^ r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1
#4 @ 22 v250 y10
L8 O3 RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG + / r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1
#2 @ 4 v155 y10
L8 $ ED $ F8 $ 8F O4 r1r1r1 d + 4f4f + 4G + 4 一个+ 4R1 ^ 4 ^ 2 / d + 4 ^ FR2 F + 4 ^ fr2d + 4 ^ FR2 F + 4 ^ fr2d + 4 ^ FR2 F + 4 ^ fr2d + 4 ^ FR2 F + 4 ^ FR2 > d + 4 ^ FR2 F + 4 ^ fr2d + 4 ^ FR2 F + 4 ^ FR2 < F + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ fr2f + 4 ^ G + R2 F + 4 ^ FR2 > 一个+ 4 ^ G + R2 F + 1A + 4 ^ G + R2 F + 1 F + 4 ^ FR2 d + 1 F + 4 ^ FR2 d + 2 ^ d + 4 ^ r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1
#3 @ 10 v210 y10
R1 ^ 1 O3 c8r8d8r8 c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8 C8 @ 10d16d16 @ 21 C8 @ 10d16d16 @ 21 C8 @ 10d16d16 @ 21 / C4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8 C4 @ 10d8 @ 21c8< B8> @ 10d16d16d16d16d16r16 C4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8 C4 @ 10d8 @ 21c8 @ 10b16b16> c16c16< b16b16a16a16
#7 @ 16 v230 y10
L16 $ ED $ EF $ A9 cceeggbbggeeccee < BB> d + d + F + F + A + A + F + F + d + d + LT; BB> d + d + < AA> cceeggeecc< AA> CC < G + G + BB> d + d + + FFD d + LT; BBG + G + BB / r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1
#5 @ 4 v155 y10
L8 $ ED $ F8 $ 8F O4 r1r1r1r1 d + 4R1 ^ 2 ^ 4 / <一个+ 4 ^> CR2 C + 4 ^ CR2<一个+ 4 ^> CR2 C + 4 ^ CR2<一个+ 4 ^> CR2 C + 4 ^ CR2<一个+ 4 ^> CR2 C + 4 ^ CR2 一个+ 4 ^> CR2 C + 4 ^ CR2 <一个+ 4 ^> CR2 C + 4 ^ C r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1 R2 F + 4 ^ FR2 d + 1F + 4 ^ FR2 d + 1 C + 4 ^ CR2 < A + 1 &以及c + 4 ^ CR2 < A + 2 ^一个+ 4 ^ r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1
答案 0 :(得分:2)
您可以使用Smith-Waterman算法进行局部对齐,将字符串与自身进行比较。
http://en.wikipedia.org/wiki/Smith-Waterman_algorithm
编辑:要使算法适应自对齐,您需要将对角线中的值强制为零 - 也就是说,惩罚将整个字符串与其自身对齐的简单解决方案。然后会弹出“第二好”的对齐方式。这将是最长的两个匹配子串。重复相同的事情,找到逐渐缩短的匹配子串。
答案 1 :(得分:2)
不确定这是否是您要找的。 p>
我把字符串“testtesttesttest4notaduped + c + d + f + d + c + d + f + d + c + d + f + d + c + d + f + testtesttest”并将其转换为“[test] 4 4notadupe [d + c + d + f +] 4 [test] 3“
我确信有人会提出更有效的解决方案,因为在处理完整文件时它会有点慢。我期待着其他答案。
string stringValue = "testtesttesttest4notaduped+c+d+f+d+c+d+f+d+c+d+f+d+c+d+f+testtesttest";
for(int i = 0; i < stringValue.Length; i++)
{
for (int k = 1; (k*2) + i <= stringValue.Length; k++)
{
int count = 1;
string compare1 = stringValue.Substring(i,k);
string compare2 = stringValue.Substring(i + k, k);
//Count if and how many duplicates
while (compare1 == compare2)
{
count++;
k += compare1.Length;
if (i + k + compare1.Length > stringValue.Length)
break;
compare2 = stringValue.Substring(i + k, compare1.Length);
}
if (count > 1)
{
//New code. Added a space to the end to avoid [test]4
//turning using an invalid number ie: [test]44.
string addString = "[" + compare1 + "]" + count + " ";
//Only add code if we are saving space
if (addString.Length < compare1.Length * count)
{
stringValue = stringValue.Remove(i, count * compare1.Length);
stringValue = stringValue.Insert(i, addString);
i = i + addString.Length - 1;
}
break;
}
}
}
答案 2 :(得分:1)
LZW可以提供帮助:它使用前缀字典来搜索重复模式,并使用对先前条目的引用替换此类数据。我认为根据您的需求调整它应该不难。
答案 3 :(得分:0)
为什么不使用System.IO.Compression?