正则表达式,不在符号之间提取字符串

时间:2017-01-11 13:26:43

标签: java regex

我想将文字拆分为','但不是','在括号或雪佛龙之间。

例如:

The string "test.toto, test->toto.value(), sizeof(test, toto)" should return this list '[test.toto, test->toto.value(), sizeof(test, toto)]'

The string "test.toto, test.value(), toto" should return this list '[test.toto, test.value(), toto]'

The string "toto, toto<titi, tutu>&, titi" should return this list '[toto, toto<titi, tutu>&, titi]'

现在,我写这个正则表达式来匹配那些逗号

',(?![^(]*\))(?![^<>]*\>)' 

但它并没有为第一个例子产生好结果。

有人有想法吗?

提前致谢!

3 个答案:

答案 0 :(得分:1)

我创建了一个模式,它匹配用逗号分隔的组,而不是尝试匹配逗号。因此,java代码不会被分隔符拆分,而是会列出所有匹配的组:

@RunWith(Parameterized.class)
public class RegexTest {

    private final String testString;
    private final Collection<String> expectedResult;


    public RegexTest(String testString, String[] expectedResult) {
        this.testString = testString;
        this.expectedResult = Arrays.asList(expectedResult);
    }

    private Collection<String> findMatchedWords(String sentence) {
        Pattern pattern = Pattern.compile("((\\<.*?\\>|\\(.*?\\)|[^, ])+)");

        Matcher matcher = pattern.matcher(sentence);
        List<String> matches = new ArrayList<>();

        while(matcher.find()){
            matches.add(matcher.group());
        }
        return matches;
    }


    @Test
    public void testPattern() {         
        Collection<String> actualResult = findMatchedWords(testString);

        TestCase.assertEquals(expectedResult, actualResult);
    }


    @Parameters
    public static Iterable<?> getTestParamters() {
        Object[][] parameters = {
                {"test.toto, test.value(), toto", new String[]  { "test.toto", "test.value()", "toto" }},
                {"test.toto, test->toto.value(), sizeof(test, toto)", new String[] { "test.toto", "test->toto.value()", "sizeof(test, toto)" }},
                {"toto, toto<titi, tutu>&, titi", new String[]  { "toto", "toto<titi, tutu>&", "titi" }}
        };
        return Arrays.asList(parameters);
    }
}

编辑:我误读了包含&lt;的OP示例和&gt;,但它已修复。

答案 1 :(得分:0)

我写了这个方法来完成这项工作

public static List<String> splitByUpperComma(String toSplit) {
    int parenthesisCount = 0;
    boolean innerChevron = false;
    int pos = 0;
    ArrayList<Integer> indexes = new ArrayList<Integer>();

    for (char currentChar : toSplit.toCharArray()) {
        if (currentChar == '(') {
            parenthesisCount++;
        } else if (currentChar == ')') {
            parenthesisCount--;
        } else if (currentChar == '<') {
            innerChevron = true;
        } else if (currentChar == '>') {
            innerChevron = false;
        } else if (currentChar == ',' && !innerChevron && parenthesisCount == 0) {
            indexes.add(pos);
        }
        pos++;
    }

    ArrayList<String> splittedString = new ArrayList<String>();
    int previousIndex = 0;
    for (Integer idx : indexes) {
        splittedString.add(toSplit.substring(previousIndex, idx));
        previousIndex = idx + 1;
    }
    splittedString.add(toSplit.substring(previousIndex, toSplit.length()));

    return splittedString;
}

但它不是正则表达式..

答案 2 :(得分:0)

我无法检查,因为我不在电脑上,但试一试:

(?:[,]?)([^,]*([(<].*?[)>])?[^,]*)

您可能必须在括号中转义括号。