任何人都知道我会如何找到&替换字符串中的文本?基本上我有两个字符串:
string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDABQODxIPDRQSERIXFhQYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f3//";
string secondS = "abcdefg2wBDABQODxIPDRQSERIXFh/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/abcdefg";
我想搜索firstS
以查看它是否包含secondS
中的任何字符序列,然后替换它。它还需要用方括号中的替换字符数替换:
[NUMBER-OF-CHARACTERS置换]
例如,因为firstS
和secondS
都包含“2wBDABQODxIPDRQSERIXFh”和“/ f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 / f39 /“他们需要更换。那么firstS
就变成了:
string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/[22]QYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39[61]f3//";
希望这是有道理的。我想我可以用Regex做到这一点,但我不喜欢它的低效率。有没有人知道另一种更快的方式?
答案 0 :(得分:3)
有没有人知道另一种更快的方式?
是的,这个问题实际上有一个正确的名称。它被称为Longest Common Substring,它有一个reasonably fast solution。
这是an implementation on ideone。它找到并替换十个字符或更长字符的所有常见子字符串。
// This comes straight from Wikipedia article linked above:
private static string FindLcs(string s, string t) {
var L = new int[s.Length, t.Length];
var z = 0;
var ret = new StringBuilder();
for (var i = 0 ; i != s.Length ; i++) {
for (var j = 0 ; j != t.Length ; j++) {
if (s[i] == t[j]) {
if (i == 0 || j == 0) {
L[i,j] = 1;
} else {
L[i,j] = L[i-1,j-1] + 1;
}
if (L[i,j] > z) {
z = L[i,j];
ret = new StringBuilder();
}
if (L[i,j] == z) {
ret.Append(s.Substring( i-z+1, z));
}
} else {
L[i,j]=0;
}
}
}
return ret.ToString();
}
// With the LCS in hand, building the answer is easy
public static string CutLcs(string s, string t) {
for (;;) {
var lcs = FindLcs(s, t);
if (lcs.Length < 10) break;
s = s.Replace(lcs, string.Format("[{0}]", lcs.Length));
}
return s;
}
答案 1 :(得分:1)
在“最长公共子串”和“最长公共子序列”之间需要非常小心“
对于子字符串:http://en.wikipedia.org/wiki/Longest_common_substring_problem
对于SubSequence:http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
我建议您在youtube上看到关于这两个主题的几个视频 http://www.youtube.com/results?search_query=longest+common+substring&oq=longest+common+substring&gs_l=youtube.3..0.3834.10362.0.10546.28.17.2.9.9.2.225.1425.11j3j3.17.0...0.0...1ac.lSrzx8rr1kQ
你可以在这里找到最长公共子序列的c#实现:
http://www.alexandre-gomes.com/?p=177
http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_subsequence
答案 2 :(得分:0)
我有类似的问题,但是对于单词出现!所以,我希望这可以提供帮助。我使用SortedDictionary
和二叉搜索树
/* Application counts the number of occurrences of each word in a string
and stores them in a generic sorted dictionary. */
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class SortedDictionaryTest
{
public static void Main( string[] args )
{
// create sorted dictionary
SortedDictionary< string, int > dictionary = CollectWords();
// display sorted dictionary content
DisplayDictionary( dictionary );
}
// create sorted dictionary
private static SortedDictionary< string, int > CollectWords()
{
// create a new sorted dictionary
SortedDictionary< string, int > dictionary =
new SortedDictionary< string, int >();
Console.WriteLine( "Enter a string: " ); // prompt for user input
string input = Console.ReadLine();
// split input text into tokens
string[] words = Regex.Split( input, @"\s+" );
// processing input words
foreach ( var word in words )
{
string wordKey = word.ToLower(); // get word in lowercase
// if the dictionary contains the word
if ( dictionary.ContainsKey( wordKey ) )
{
++dictionary[ wordKey ];
}
else
// add new word with a count of 1 to the dictionary
dictionary.Add( wordKey, 1 );
}
return dictionary;
}
// display dictionary content
private static void DisplayDictionary< K, V >(
SortedDictionary< K, V > dictionary )
{
Console.WriteLine( "\nSorted dictionary contains:\n{0,-12}{1,-12}",
"Key:", "Value:" );
/* generate output for each key in the sorted dictionary
by iterating through the Keys property with a foreach statement*/
foreach ( K key in dictionary.Keys )
Console.WriteLine( "{0,- 12}{1,-12}", key, dictionary[ key ] );
Console.WriteLine( "\nsize: {0}", dictionary.Count );
}
}
答案 3 :(得分:0)
这可能是狗的速度慢,但是如果你愿意承担一些技术债务并且现在需要一些原型进行原型设计,你可以使用LINQ。
string firstS = "123abc";
string secondS = "456cdeabc123";
int minLength = 3;
var result =
from subStrCount in Enumerable.Range(0, firstS.Length)
where firstS.Length - subStrCount >= 3
let subStr = firstS.Substring(subStrCount, 3)
where secondS.Contains(subStr)
select secondS.Replace(subStr, "[" + subStr.Length + "]");
结果
456cdeabc[3]
456cde[3]123