我正在创建一段java代码来读取和解释tsv文件。我想找到一个能够在文件中分割行的正则表达式:
""
)示例输入行:
"aaa" 123 "bbb" "cc" "ddd" "aaa" 123 "bbb" "cc" " 6" "ddd" 456 "eee" "ff" " "" " "ddd" 456 "eee" "ff" " "" aaa "" "
* (请注意:最后三个字符串中的标签)
我当前的正则表达式是("[^"]*"*|[^\t]+)+
,但是在最后一个示例中失败了(使得更小的子字符串)
答案 0 :(得分:0)
让我们解决这个问题:
\t(?=(?:\[^\"\]*\"\[^\"\]*\")*\[^\"\]*$)
(点击链接获取模式说明)
示例代码:ideone demo
import java.util.regex.Pattern;
public class example {
public static void main(String[] asd){
String sourcestring = "\"aaa\" 123 \"bbb\" \"cc\" \"ddd\"\n"
+ "\"aaa\" 123 \"bbb\" \"cc\" \" 6\"\n"
+ "\"ddd\" 456 \"eee\" \"ff\" \" \"\" \"\n"
+ "\"ddd\" 456 \"eee\" \"ff\" \" \"\" aaa \"\" \"";
Pattern reLines = Pattern.compile("\\n");
Pattern reTsv = Pattern.compile("\\t(?=(?:[^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)");
String[] lines = reLines.split(sourcestring);
for(int linesIdx = 0; linesIdx < lines.length; linesIdx++ ) {
String[] parts = reTsv.split(lines[linesIdx]);
for(int partsIdx = 0; partsIdx < parts.length; partsIdx++ ) {
System.out.println( "[" + partsIdx + "] = " + parts[partsIdx]);
}
}
}
}
输出:
[0] = "aaa"
[1] = 123
[2] = "bbb"
[3] = "cc"
[4] = "ddd"
[0] = "aaa"
[1] = 123
[2] = "bbb"
[3] = "cc"
[4] = " 6"
[0] = "ddd"
[1] = 456
[2] = "eee"
[3] = "ff"
[4] = " "" "
[0] = "ddd"
[1] = 456
[2] = "eee"
[3] = "ff"
[4] = " "" aaa "" "