比较2个多线字符串

时间:2015-04-08 23:28:33

标签: java c# vb.net algorithm function

我正在处理一个应用程序,我希望(如果可能的话)能够比较两个多行的字符串。

例如:

string1=
001000
000100
000100

string4=
10001000
00100100
00000100

string2=
10
01
01

string3=
010
010
010

输出:

string1.contains(string2)   > true
string1.contains(string3)   > false
string4.contains(string2)   > true
string4.contains(string3)   > false

我需要一个能够看到string1包含string2的函数。 我已经尝试过,但还没有找到满足我需求的答案。我需要它能够看到它包含行相对于彼此的字符串(例如,string1必须在一行上有10,而在它下面就是01,依此类推......)

如果函数可以,那么它应该返回true为" string1包含string2)和false为"字符串1包含string3",

非常感谢您提供的任何帮助。 (如果可能,请使用VB,C#或类似代码)

提前致谢。

6 个答案:

答案 0 :(得分:1)

根据你对@ RufusL answer的评论:

  

它看起来很有希望,但我不确定它会起作用,因为string1不会总是3行长。但是你的代码看起来很容易适应,如果它们有不同的长度,它会在行中循环:)

我假设要比较的源,可能比要比较的字符串包含更多的“行”。因此,采取的步骤基本上是:

  1. 将源和要比较的字符串转换为行。
  2. 虽然源中有足够的候选起始行,但请尝试找到匹配的起始行。
  3. 当找到候选起始源行中的匹配子字符串时,检查所有后续行是否在该子字符串位置匹配。如果他们这样做=>匹配
  4. 下面的代码片段以C#扩展方法提供了此算法的实现。

    public static class LinesMatcher
    {
        public static int CountMatches(this string source, string toCompare, params string[] lineSeparators)
        {
            // split into parts.
            var srcParts = source.Split(lineSeparators, StringSplitOptions.RemoveEmptyEntries);
            var cmpParts = toCompare.Split(lineSeparators, StringSplitOptions.RemoveEmptyEntries);
    
            // check until candidate first matching source lines have been exhausted.
            var matchCount = 0;
            var startLineNdx = 0;
            while (cmpParts.Length <= (srcParts.Length - startLineNdx))
            {
                // search for a match from the start of the current line.
                var matchNdx = srcParts[startLineNdx].IndexOf(cmpParts[0]);
                while (matchNdx >= 0)
                {
                    // Line has a match with the first line in cmpParts
                    // Check if all subsequent lines match from the same position.
                    var match = true;
                    for (var i = 1; i < cmpParts.Length; i++)
                    {
                        if (srcParts[startLineNdx + i].IndexOf(cmpParts[i], matchNdx) != matchNdx)
                        {
                            match = false;
                            break;
                        }
                    }
                    if (match) // all lines matched
                        matchCount++;
    
                    // try to find a next match in this line.
                    matchNdx = srcParts[startLineNdx].IndexOf(cmpParts[0], matchNdx + 1);
                }
    
                // Try next line in source as matching start.
                startLineNdx++;
            }
            return matchCount;
        }
    }
    

    用法:

    class Program
    {
        public static void Main(params string[] args)
        {
            var seps = new[] { "\n" };
            var string0 = "00000010\n001000\n000100\n000100";
            var string1 = "001000\n000100\n000100";
            var string4 = "10001000\n00100100\n00000100";
            var string2 = "10\n01\n01";
            var string3 = "010\n010\n010";
    
            Console.WriteLine(string1.CountMatches(string2, seps));
            Console.WriteLine(string1.CountMatches(string3, seps));
            Console.WriteLine(string4.CountMatches(string2, seps));
            Console.WriteLine(string4.CountMatches(string3, seps));
            Console.WriteLine(string0.CountMatches(string2, seps));
        }
    }
    

答案 1 :(得分:0)

我刚刚在Groovy中实现了它(尽量不使用某些特定于Groovy的语法)。
这里的要点:https://gist.github.com/s0nerik/9c52175dcd68ad8807b8

我的解决方案假设两个字符串具有相同的高度(行号),并且每个字符串中的每一行具有相同的长度。此外,它假定第一个字符串的每一行与第二个字符串相比具有更大或相等的长度。

答案 2 :(得分:0)

您可以编写string类的扩展名来执行此操作:

public static class StringExtensions
{
    public static bool MultiContains(this string source, string compare)
    {
        if (source == null) return compare == null;
        if (compare == null) return false;

        var sourceParts = source.Split('\n');
        var compareParts = compare.Split('\n');

        if (sourceParts.Length != compareParts.Length) return false;

        // Try to get a match with the first pair of strings
        int firstMatchIndex = sourceParts[0].IndexOf(compareParts[0]);

        // If we didn't find any matches in the first pair, return false
        if (firstMatchIndex == -1) return false;

        // If there are no other matches to be compared, return true
        if (sourceParts.Length == 1) return true;

        var matched = false;

        // Otherwise, see if all compare matches are at same position
        while (!matched && firstMatchIndex > -1)
        {
            for (int i = 1; i < sourceParts.Length; i++)
            {
                // See if we have a match in the same position as the first match
                matched = sourceParts[i].IndexOf(compareParts[i], 
                    firstMatchIndex) == firstMatchIndex;

                if (!matched)
                {
                    // If one of the strings didn't match in the same position, 
                    // try to find another match in the first lines
                    firstMatchIndex = sourceParts[0].IndexOf(compareParts[0], 
                        firstMatchIndex + 1);

                    break;
                }
            }
        }

        return matched;
    }
}

然后用法如下:

public static void Main()
{
    var string1 = "001000\n000100\n000100";
    var string4 = "10001000\n00100100\n00000100";
    var string2 = "10\n01\n01";
    var string3 = "010\n010\n010";

    Console.WriteLine(string1.MultiContains(string2));
    Console.WriteLine(string1.MultiContains(string3));
    Console.WriteLine(string4.MultiContains(string2));
    Console.WriteLine(string4.MultiContains(string3));
}

输出:

  


  假
  真
  假

答案 3 :(得分:0)

试试这个:

Sub Main()
    Dim string1 = String.Join(vbCrLf, "001000", "000100", "000100")
    Dim string4 = String.Join(vbCrLf, "10001000", "00100100", "00000100")
    Dim string2 = String.Join(vbCrLf, "10", "01", "01")
    Dim string3 = String.Join(vbCrLf, "010", "010", "010")

    Console.WriteLine("String1 contains String2 {0}", string1.ContainsPattern(string2))
    Console.WriteLine("String1 contains String3 {0}", string1.ContainsPattern(string3))
    Console.WriteLine("String4 contains String2 {0}", string4.ContainsPattern(string2))
    Console.WriteLine("String4 contains String3 {0}", string4.ContainsPattern(string3))
    Console.ReadKey()
End Sub

<Extension> Function ContainsPattern(target As String, pattern As String) As Boolean
    Dim lines = Split(target, vbCrLf)
    Dim patternLines = Split(pattern, vbCrLf)

    If lines.Count <> patternLines.Count Then
        Throw New ArgumentException("Line counts differ")
    End If

    If patternLines.First.Length > lines.First.Length Then
        Throw New ArgumentException("Pattern exceeds target")
    End If

    Dim match = False
    For m = 0 To (lines.First.Length - patternLines.First.Length)
        match = True
        For l = 0 To lines.Count - 1
            If Not lines(l).Substring(m).StartsWith(patternLines(l)) Then
                match = False
                Exit For
            End If
        Next
        If match = True Then
            Return True
        End If
    Next

    Return False
End Function

答案 4 :(得分:0)

最好的办法是将字符串作为一个字符串。 然后创建一个函数来检查字符串中的数字量。 你应该给你的函数的第二个参数, 是您获得换行符的数字量。

然后一个简单的划分可以检查它应该自动多少行。 像这样:

Module Module1

    Sub Main()

        Dim string1 As String = "001000000100000100"

        Dim string2 As String = "100101"

        Dim string3 As String = "010010010"
        Dim string4 As String = "100010000010010000000100"


        Dim string1Split As List(Of String) = SplitUp(string1, 6)
        Dim string2Split As List(Of String) = SplitUp(string2, 2)
        Dim string3Split As List(Of String) = SplitUp(string3, 3)
        Dim string4Split As List(Of String) = SplitUp(string4, 8)

        For Each x As String In string1Split
            Console.WriteLine("string1 " + x)
        Next
        For Each x As String In string2Split
            Console.WriteLine("string2 " + x)
        Next
        For Each x As String In string3Split
            Console.WriteLine("string3 " + x)
        Next
        For Each x As String In string4Split
            Console.WriteLine("string4 " + x)
        Next

        Console.ReadLine()

    End Sub

    Public Function SplitUp(number As String, linebreak As Integer) As List(Of String)

        Dim numberLength As Integer = number.Length
        Dim timesToGo As Integer = numberLength \ linebreak


        Dim listwithStrings As New List(Of String)


        For a As Integer = 1 To timesToGo

            Dim value1 As String = number.Substring(0, linebreak)
            Dim value2 As String = number.Substring(linebreak)
            listwithStrings.Add(value1)
            number = value2
        Next



        Return listwithStrings


    End Function


End Module

你应该做的最后一件事就是拿一个列表行(一行数字) 并将其与另一个列表中的另一行进行比较。 如果两个或多个数字匹配,则保持匹配开始的数字。 然后使用该数字检查2列表中的第三行。

你可以关注吗?的xD

答案 5 :(得分:0)

试试这个。我想如果不是所有情况我都会报道:

    const char stringSplit = '\n';
    public static bool contains(this string s, string cmp)
    {
        var t0l = s.IndexOf(stringSplit);
        var c0l = cmp.IndexOf(stringSplit);

        var d = t0l - c0l;
        if (t0l == -1 || c0l == -1 || d < 0)
            return false;
        var tc = s.Replace(stringSplit.ToString(), "");
        var cs = cmp.Split(stringSplit);
        string regS = "";
        foreach (var c in cs)
        {
            if (regS == "")
                regS = c;
            else
                regS += ".{" + d + "}" + c;
        }
        return Regex.IsMatch(tc, regS);
    }