检查一个字符串是否是另一个字符串的子字符串

时间:2018-12-10 12:14:45

标签: java substring

我阅读了一篇很好的文章,内容涉及检查字符串是否是另一个的子字符串。

练习的内容是:

  

编写一个从命令行获取2个字符串参数的程序。   程序必须验证第二个字符串是否是第一个字符串的子字符串   字符串(您不能使用substr,substring或任何其他标准   函数,包括正则表达式库)。

     

第二个子字符串中每次出现*表示它可以是一个   匹配第一个字符串的零个或多个字符。

     

请考虑示例:输入字符串1:abcd输入字符串2:a * c程序   应该评估字符串2是字符串1的子字符串。

     

如果满足以下条件,则可以将星号(*)视为常规字符:   它前面有一个反斜杠(\)。反斜杠(\)被视为   除星号(*)之前的所有情况下的常规字符。

我编写了一个简单的应用程序,该应用程序首先检查第二个字符串是否不比第一个字符串长(但是存在一个问题,当在(“ ab”,“ a * b”)处进行测试时,这是正确的测试,但是该方法失败了) :

public static boolean checkCharactersCount(String firstString, String secondString) {
    return (firstString.length() > 0 && secondString.length() > 0) &&
            (firstString.length() > secondString.length());

...然后下一个验证是一个subtring:

public static boolean checkSubstring(String firstString, String secondString) {
    int correctCharCounter = 0;
    int lastCorrectCharAtIndex = -1;

    for (int i = 0; i < secondString.length(); i++) {
        for (int j = 0; j < firstString.length(); j++) {
            if (j > lastCorrectCharAtIndex) {

                if ((secondString.charAt(i) == firstString.charAt(j)) || secondString.charAt(i) == '*') {
                    correctCharCounter++;
                    lastCorrectCharAtIndex = j;
                }

                if (correctCharCounter >= secondString.length())
                    return true;
            }
        }
    }

    return false;
}

但是有两个问题:

  1. 我上面的代码不支持字符连续性(例如测试:checkSubstring(“ abacd”,“ bcd”)返回true,但它是错误的!-应该返回false)
  2. 有什么想法如何将特殊符号验证为“ \ *”?要测试的样本(checkSubstring(“ a bc”,“ \ b”)

您对解决方案的看法如何? :)

4 个答案:

答案 0 :(得分:3)

尝试以下方法:(添加注释以供解释)

// only for non empty Strings
public boolean isSubString(String string1,String string2)
{
    // step 1: split by *, but not by \*
    List<String>list1 = new ArrayList<String>();
    char[]cs = string2.toCharArray();
    int lastIndex = 0 ;
    char lastChar = 0 ;
    int i = 0 ;
    for(; i < cs.length ; ++i)
    {
        if(cs[i]=='*' && lastChar!='\\')
        {
            list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
            //earlier buggy line:
            //list1.add(new String(cs,lastIndex,i-lastIndex));
            lastIndex = i + 1 ;
        }
        lastChar = cs[i];
    }
    if(lastIndex < i )
    {
        list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
    }
    // step 2: check indices of each string in the list
    // Note: all indices should be in proper order.
    lastIndex = 0;
    for(String str : list1)
    {
        int newIndex = string1.indexOf(str,lastIndex);
        if(newIndex < 0)
        {
            return false;
        }
        lastIndex = newIndex+str.length();
    }
    return true;
}

如果不允许您使用String.indexOf(),然后编写一个函数public int indexOf(String string1,String string2, int index2)并替换此语句

int newIndex = string1.indexOf(str,lastInxdex);

带有以下语句:

int newIndex = indexOf(string1, str,lastInxdex);

================================================ =========

附录A:我测试过的代码:

package jdk.conf;

import java.util.ArrayList;
import java.util.List;

public class Test01 {
    public static void main(String[] args)
    {
        Test01 test01 = new Test01();
        System.out.println(test01.isSubString("abcd", "a*c"));
        System.out.println(test01.isSubString("abcd", "bcd"));
        System.out.println(test01.isSubString("abcd", "*b"));
        System.out.println(test01.isSubString("abcd", "ac"));
        System.out.println(test01.isSubString("abcd", "bd"));
        System.out.println(test01.isSubString("abcd", "b*d"));
        System.out.println(test01.isSubString("abcd", "b\\*d"));
        System.out.println(test01.isSubString("abcd", "\\*d"));
        System.out.println(test01.isSubString("abcd", "b\\*"));

        System.out.println(test01.isSubString("a*cd", "\\*b"));
        System.out.println(test01.isSubString("", "b\\*"));
        System.out.println(test01.isSubString("abcd", ""));

        System.out.println(test01.isSubString("a*bd", "\\*b"));
    }
    // only for non empty Strings
    public boolean isSubString(String string1,String string2)
    {
        // step 1: split by *, but not by \*
        List<String>list1 = new ArrayList<String>();
        char[]cs = string2.toCharArray();
        int lastIndex = 0 ;
        char lastChar = 0 ;
        int i = 0 ;
        for(; i < cs.length ; ++i)
        {
            if(cs[i]=='*' && lastChar!='\\')
            {
                list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
                lastIndex = i + 1 ;
            }
            lastChar = cs[i];
        }
        if(lastIndex < i )
        {
            list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
        }
        // step 2: check indices of each string in the list
        // Note: all indices should be in proper order.
        lastIndex = 0;
        for(String str : list1)
        {
            int newIndex = string1.indexOf(str,lastIndex);
            if(newIndex < 0)
            {
                return false;
            }
            lastIndex = newIndex+str.length();
        }
        return true;
    }
}

输出:

true
true
true
false
false
true
false
false
false
false
false
true
true

答案 1 :(得分:1)

我将分两个阶段进行。

让我们调用潜在的子字符串p和我们正在测试的包含子字符串s的字符串。

将“包含”部分简化为“ p匹配从s的第N个位置开始?”的一系列问题。显然,您从第一个位置开始经过s,以查看p在s的任何位置是否匹配。

在匹配中,我们有可能碰到“ *”;在这种情况下,我们想知道*后面的p部分是否是s直到p的部分匹配到*之前s的部分的子串。这建议使用一个递归例程,该例程获取要匹配的部分和要匹配的字符串,然后返回true / false。当您遇到*时,形成两个新字符串并给自己打电话。

如果遇到\,则只需继续与下一个字符进行常规匹配,而无需进行递归调用。鉴于您需要这样做,我想如果从原始p构建pPrime可能是最简单的方法,这样您就可以在遇到反斜杠时将其删除,就像从通配符中删除星号一样匹配。

我实际上还没有编写任何代码,您只是要求方法。

答案 2 :(得分:1)

我发现这是一个很好的挑战。这种练习确实迫使我们在一般的语言和算法的基础上进行思考。没有lambda,没有流,没有正则表达式,找不到,没有子字符串,什么都没有。只是旧的CharAt,有一些缺点,而没有。从本质上讲,我做了一个查找方法,该方法查找要找到的字符串的第一个字符,然后从该点开始再考虑您的规则的另一个查找。如果失败,则返回找到的第一个索引,添加一个索引,并执行必要的迭代次数,直到字符串结束。如果找不到匹配项,则应返回false。如果仅找到一个,则足以将其视为子字符串。在演算的开始考虑最重要的极端情况,以便确定是否检测到错误就不会进一步。因此,单独的“ *”表示任何字符匹配,我们可以使用\对其进行转义。我试图包括大多数极端情况,这确实是一个挑战。我不确定我的代码是否涵盖了所有情况,但应该涵盖很多情况。我真的很想帮助您,所以这是我的方法,这是我的代码:

package com.jesperancinha.string;

public class StringExercise {

    private static final char ASTERISK = '*';
    private static final char BACKSLASH = '\\';

    public boolean checkIsSubString(String mainString, String checkString) {
        int nextIndex = getNextIndex(0, checkString.charAt(0), mainString);
        if (nextIndex == -1) {
            return false;
        }
        boolean result = checkFromIndex(nextIndex, mainString, checkString);
        while (nextIndex < mainString.length() - 1 && nextIndex > -1) {
            if (!result) {
                nextIndex = getNextIndex(nextIndex + 1, checkString.charAt(0), mainString);
                if (nextIndex > -1) {
                    result = checkFromIndex(nextIndex, mainString, checkString);
                }
            } else {
                return result;
            }
        }
        return result;
    }

    private int getNextIndex(int start, char charAt, String mainString) {
        if (charAt == ASTERISK || charAt == BACKSLASH) {
            return start;
        }
        for (int i = start; i < mainString.length(); i++) {
            if (mainString.charAt(i) == charAt) {
                return i;
            }
        }
        return -1;
    }

    private boolean checkFromIndex(int nextIndex, String mainString, String checkString) {
        for (int i = 0, j = 0; i < checkString.length(); i++, j++) {
            if (i < (checkString.length() - 2) && checkString.charAt(i) == BACKSLASH
                    && checkString.charAt(i + 1) == ASTERISK) {
                i++;
                if (mainString.charAt(j + nextIndex) == BACKSLASH) {
                    j++;
                }
                if (checkString.charAt(i) != mainString.charAt(j + nextIndex)) {
                    return false;
                }
            }
            if (i > 0 && checkString.charAt(i - 1) != BACKSLASH
                    && checkString.charAt(i) == ASTERISK) {
                if (i < checkString.length() - 1 && (j + nextIndex) < (mainString.length() - 1)
                        && checkString.charAt(i + 1) !=
                        mainString.charAt(j + nextIndex + 1)) {
                    i--;
                } else {
                    if (j + nextIndex == mainString.length() - 1
                            && checkString.charAt(checkString.length() - 1) != ASTERISK
                            && checkString.charAt(checkString.length() - 2) != BACKSLASH) {
                        return false;
                    }
                }
            } else {
                if ((j + nextIndex) < (mainString.length() - 2) &&
                        mainString.charAt(j + nextIndex)
                                != checkString.charAt(i)) {
                    return false;
                }
            }
        }
        return true;
    }

}

我进行了一组单元测试,但是如果我将整个类放在这里,那将太长了,我想向您展示的唯一一件事就是在单元测试中实现的测试用例。这是我针对这种情况的单元测试的精简版本:

package com.jesperancinha.string;

import static org.assertj.core.api.Assertions.assertThat;

import org.junit.jupiter.api.Test;

class StringExerciseMegaTest {

    @Test
    void checkIsSubString() {
        StringExercise stringExercise = new StringExercise();
        boolean test = stringExercise.checkIsSubString("abcd", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "a\\*c");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("a*c", "a\\*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa*c", "a\\*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa*csdfdsfdsfdsf", "a\\*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa**csdfdsfdsfdsf", "a\\*c");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("aasdsadasa**csdfdsfdsfdsf", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa*csdfdsfdsfdsf", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "a*c");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "dwer");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "75tyhfgh");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\iauoisdf9977675tyhfgh", "riou\\iauois");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9977675tyhfgh", "riou\\\\*iauois");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\\\*977675tyhfgh");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\*977675tyhfgh");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("\\*aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\*aasdwer");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("*aasdweriou\\*iauoisdf9\\*977675tyhfgh", "*aasdwer");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "bc");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "zbc");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("abcd", "*bc*");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("*bcd", "\\*bc*");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "az*bc");
        assertThat(test).isFalse();
    }
}

答案 3 :(得分:0)

我的解决方案如下所示,我评论了所有内容,希望您能理解。

public static void main(String [] args) throws Exception {
        System.err.println(contains("bruderMusssLos".toCharArray(),"Mu*L*".toCharArray()));
}

public static boolean contains(char [] a, char [] b) {

    int counterB = 0; // correct characters
    char lastChar = '-'; //last Character encountered in B

    for(int i = 0; i < a.length; i++) {

        //if last character * it can be 0 to infinite characters
        if(lastChar == '*') {

            //if next characters in a is next in b reset last char
            // this will be true as long the next a is not the next b
            if(a[i] == b[counterB]) {
                lastChar = b[counterB];
                counterB++;

            }else {
                counterB++;
            }

        }else {

            //if next char is * and lastchar is not \ count infinite to next hit
            //otherwise * is normal character
            if(b[counterB] == '*' && lastChar != '\\') {
                lastChar = '*';
                counterB++;
            }else {
                //if next a is next b count
                if(a[i] == b[counterB]) {
                    lastChar = b[counterB];
                    counterB++;
                }else {
                    //otherwise set counter to 0
                    counterB = 0;
                }                   
            }

        }

        //if counterB == length a contains b
        if(counterB == b.length)
            return true;

    }


    return false;
}

例如,当前测试返回true: