"线程中的异常" main" java.lang.StackOverflowError的"

时间:2014-05-08 07:18:07

标签: java regex

我得到"线程中的异常" main" java.lang.StackOverflowError的"使用正则表达式时:

(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)") 

表示长字符串。实际上我想在.csv文件中基于',#39;(在.csv文件中的#34;"之外)拆分字符串。它适用于450列,但如下所示给出更多列的错误---

Exception in thread "main" java.lang.StackOverflowError
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4148)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)
    at java.util.regex.Pattern$Curly.match0(Pattern.java:4170)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4132)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)

2 个答案:

答案 0 :(得分:3)

使用atomic group代替您不需要的捕获组:

,(?=(?>[^\"]*\"[^\"]*\")*[^\"]*$)

这应该加快速度并防止不必要的回溯。

答案 1 :(得分:0)

我在通过正则表达式查找长字符串时遇到问题(当行长度超过 25k 个字符时)。我通过在正则表达式的末尾添加加号 (+) 来修复它。

参见所有格量词 https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

这是我修改后的正则表达式

library(dplyr)

dat %>% 
    group_by(participant) %>%
    summarize(condition = first(condition)) %>%
    ungroup() %>%
    mutate(
        order = case_when(
            condition == "neutral" ~ 1, 
            condition == "happy" ~ 2, 
            condition == "sad" ~ 3
        )
    )

这里是解析 json 的完整代码,我遇到了 StackOverflowError

// find string using pattern `normal* (special normal*)*` 
// where special — any escaped symbol
Pattern stringRe = Pattern.compile("\"[^\\\\\"]*(\\\\.[^\\\\\"]*)*+\"");

关于https://regular-expressions.mobi/possessive.html?wlr=1的一些附加信息

<块引用>

当占有量词很重要时

所有格量词的主要实际好处是加速您的正则表达式。特别是,所有格量词允许您的正则表达式更快地失败。在上面的例子中,当结束引号不匹配时,我们知道正则表达式不可能跳过引号。所以没有必要回溯和检查报价。我们通过使量词具有所有格来使正则表达式引擎意识到这一点。事实上,一些引擎,包括JGsoft引擎,在编译你的正则表达式时,会检测到[^"]*和"是互斥的,并自动使星号所有格。