有没有办法用正则表达式而不是循环来匹配它?

时间:2019-06-28 14:50:15

标签: java regex

我在这里有此函数,该函数计算引号外的花括号,而忽略其中的括号(根据我的使用传递字符串和'{'或'}')

public static int countCurlyBraces(String s, char c) {
    int count = 0;
    for (char cr : s.toCharArray()) {
        if (cr == '"')
            if (stack.isEmpty())
                stack.push(cr);
            else
                stack.pop();

            if (stack.size() == 1 && cr == c)
                count++;
    }
    return StringUtil.countMatches(s, c) - count;
}

我正在尝试用正则表达式替换它,但是我有点麻烦,有可能吗?

public static int countCurlyBraces(String s, char c) {
    Matcher a = Pattern.compile("\"(.*?)[" + c + "](.*?)\"").matcher(s);
    int count = 0;

    while (a.find()) 
        count++;

    return StringUtil.countMatches(s, c) - count;
}

我用于测试的示例字符串是:

  

sdfg“ srfg {rmjy#” rmyrmy {rymundh“ ecfvr {cerv#” fes {dc“ cf2234TC @ $#ct234” etw243T @#$ c“ nhg

这应该返回2,忽略引号中包含的两个花括号。 Regex表达式看到引号中包含的所有花括号并输出0。

文档如下:

LOCALE
user="XXXXXXX" time=1561234682/* "26-Jun-2019 23:00:03" */
{
  LOCALE="XXXXXXX"
}
SITE NAME="XxxXXxxx"
 user="XXXXXX" time=1568532503/* "26-Jun-2019 23:00:03" */
{
  SYSTEM_NAME="XXX-NNNNN"
  SYSTEM_IDENTIFIER="{XXXX-XXXX-XXX_XXX-XX}"
  SYSTEM_ID=NNNNN
  SYSTEM_ZONE_NAME="XXXXXX"
  DEFAULT_COMMUNICATION_TYPE=REDUNDANT
  IP_ADDR_AUTO_GEN=T
  PP_LAD="aGx{4"
  PVQ_LIMIT=0.5
  BCK_LIMIT=0.3
  MNN_LIMIT=0.1
  COMPANY_NAME=""
  DISPLAY_VERSION_CONTROL_ENABLED=F
}

2 个答案:

答案 0 :(得分:0)

循环可能会提高CPU效率。但是在这里,我要进行2个阶段的正则表达式:

String input="sdfg\"srfg{rmjy#\"rmyrmy{rymundh\"ecfvr{cerv#\"fes{dc\"cf2234TC@$#ct234\"etw243T@#$c\"nhg";


input=input.replaceAll("\"[^\"]*\"", ""); // becomes sdfgrmyrmy{rymundhfes{dcetw243T@#$c"nhg

input=input.replaceAll("[^{]", ""); //becomes {{

return input.length();//2

第二个正则表达式可以使用传递的实际字符(如果将其限制为{和},则应该可以使用。

input=input.replaceAll("[^"+c+"]", "");

如果我们将这两个正则表达式结合起来,它的可读性就会降低,但只有一行

input=input.replaceAll("\"[^\"]*\"|[^"+c+"]", "");

答案 1 :(得分:0)

您的方法是实现所需目标的一种非常round回的方法,效率很低。

首先,您要遍历字符串并计算引号内的字符 ,然后遍历整个字符串并再次计算 all 个匹配字符并减去数字引号内的匹配字符集...为什么?相反,您可以根据自己的需要计算引号之外的数字。

第二,通过使用s.toCharArray(),您实际上是在保存重复的数据并使字符串的内存占用量增加一倍;而是只需通过charAt访问其数据。

第三,不需要使用堆栈来跟踪是否在引号内;相反,只需翻转布尔值即可。

这是我关于您的方法的注释:

public static int countCurlyBraces(String s, char c) {
    Deque<Character> stack = ...; // I'm assuming 'stack' is some kind of Deque
    int count = 0;
    // doubling memory usage of the string by copying the chars into another array with 's.toCharArray()'
    // for each character in that string...
    for (char cr : s.toCharArray()) {
        // using a stack to keep track if you are inside quotes? just flip a boolean instead
        if (cr == '"')
            if (stack.isEmpty())
                stack.push(cr);
            else
                stack.pop();

        // if inside quotes and the character matches the target, then count it..
        // I thought you wanted to count the characters outside the quotes?
        if (stack.size() == 1 && cr == c)
            count++;
    }

    // iterate through the whole string again and count ALL the characters
    // then subtract the number inside the strings from the total to get the number outside strings
    return StringUtil.countMatches(s, c) - count;
}

相反,您可以执行以下操作,这样效率更高:

public static int countCharacterOutsideQuotes(CharSequence chars, char targetChar) {
    int count = 0;
    boolean isQuoted = false;
    // using `charAt` avoids doubling memory usage of copying all the chars into another array
    for (int i = 0; i < chars.length(); i++) {
        char c = chars.charAt(i);
        if (c == '"') {
            // found a quote, flip from not quoted to quoted or vice versa.
            isQuoted = !isQuoted;
        } else if (c == targetChar && !isQuoted) {
            // found the target character, if it's not inside quotes then count it
            count++;
        }
    }
    return count;
}

如果要从文件开始读取数据,则可以跳过将其放入字符串中,而是直接使用Reader读取它,这将节省内存,也消除了必须等待文件的速度在甚至开始处理之前都要先阅读。通过使用阅读器,您可以立即开始处理,并且一次只能在内存中保留一个字符。

public static int countCharacterOutsideQuotes(Reader reader, char targetChar) throws IOException {
    Objects.requireNonNull(reader);
    int count = 0;
    boolean isQuoted = false;
    // using `charAt` avoids doubling memory usage of copying all the chars into another array
    for (int c = reader.read(); c != -1; c = reader.read()) {
        if (c == '"') {
            // found a quote, flip from not quoted to quoted or vice versa.
            isQuoted = !isQuoted;
        } else if (c == targetChar && !isQuoted) {
            // found the target character, if it's not inside quotes then count it
            count++;
        }
    }
    return count;
}

public static void main(String[] args) {
    // try (Reader reader = new InputStreamReader(new StringReader("your-test-string-goes-here"));) {
    try (Reader reader = new InputStreamReader(new FileInputStream("/path/to/file.txt"));) {
        System.out.println(countCharacterOutsideQuotes(reader, '{'));
    } catch (IOException e) {
        e.printStackTrace();
    }
}