正则表达式用于匹配数组值内的逗号

时间:2019-05-08 19:50:22

标签: java regex regex-lookarounds regex-group regex-greedy

我正在尝试逐行解析CSV文件

String rowStr = br.readLine(); 

当我尝试打印rowStr时,我看到以下内容

"D","123123","JAMMY,"," ","PILOT"

如何从值字段中删除逗号?我想保留外面的逗号。

2 个答案:

答案 0 :(得分:2)

This expression可能会帮助您这样做,但是可能不需要使用正则表达式来执行此任务。如果您希望/必须这样做:

(")([A-z0-9\s]+)([,]?)(",)?

为了安全起见,我添加了一些界限。您可以简化它。关键是在值之前和之后添加一个捕获组。

enter image description here

例如,一个界限是,万一您可能不小心有多余的逗号而不是值,那么它就不会捕获

enter image description here

此图显示了表达式的工作方式,您可以在此link中可视化其他表达式:

enter image description here

Java测试

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "(\")([A-z0-9\\s]+)([,]?)(\",)?";
final String string = "\"D\",\"123123\",\"JAMMY,\",\" \",\"PILOT\"";
final String subst = "\\1\\2 \\4";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);

System.out.println("Substitution result: " + result);

JavaScript测试演示

const regex = /(")([A-z0-9\s]+)([,]?)(",)?/gm;
const str = `"D","123123","JAMMY,"," ","PILOT"`;
const subst = `$1$2 $4`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

性能测试

repeat = 1000000;
start = Date.now();

for (var i = repeat; i >= 0; i--) {
	var string = '"D","123123","JAMMY,"," ","PILOT"';
	var regex = /(")([A-z0-9\s]+)([,]?)(",)?/gm;
	var match = string.replace(regex, "$1$2$4");
}

end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match  ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test.  ");

答案 1 :(得分:1)

使用这样的正则表达式:

(?<!"),|,(?!")

匹配在 之前而不是"之前的逗号。
测试here