模式:
"(([^",\n ]*[,\n ])*([^",\n ]*"{2})*)*[^",\n ]*"[ ]*,[ ]*|[^",\n]*[ ]*,[ ]*|"(([^",\n ]*[,\n ])*([^",\n ]*"{2})*)*[^",\n ]*"[ ]*|[^",\n]*[ ]*
此正则表达式用于解析CSV文件。但是当它进入Pattern.matcher时,我遇到一个挂起的线程异常。如果有人可以帮助微调这种模式,请欣赏它。
[7/1/13 16:45:26:745 GMT+08:00] 00000029 ThreadMonitor W WSVR0605W: Thread "MessageListenerThreadPool : 0" (00000035) has been active for 691836 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.match(Pattern.java:4733)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4754)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$Loop.match(Pattern.java:4742)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4665)
at java.util.regex.Pattern$BitClass.match(Pattern.java:2912)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4278)
at java.util.regex.Pattern$Curly.match(Pattern.java:4233)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4752)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4689)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4606)
答案 0 :(得分:1)
问题似乎是完成比赛所需的后跟踪剪切量。
如果您的CSV格式正确,您可以使用更简单的正则表达式来解析每一行。请注意,这只会从字符串中分隔引号逗号和逗号分隔值,因此您需要使用此正则表达式通过.matcher传递每一行并迭代每个匹配项。
正则表达式:(?:^|,)"?((?<=")[^"]*|[^,"]*)"?(?=,|$)
示例文字
"root",test1,1111,"22,22",,fdsa
<强>代码强>
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "source string to match with pattern";
Pattern re = Pattern.compile("(?:^|,)\"?((?<=\")[^\"]*|[^,\"]*)\"?(?=,|$)",Pattern.CASE_INSENSITIVE);
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
捕获第1组
[0] => root
[1] => test1
[2] => 1111
[3] => 22,22
[4] =>
[5] => fdsa