我正在使用Java并且想要构建两个适合两种不同场景的reg表达式:
1:
STARTText blah, blah
\ next line with more text, but the leading backslash
\ next line with more text, but the leading backslash
\ next line with more text, but the leading backslash
直到第一行不再以反斜杠开头。
2:
Now you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text
并且此块在以下之后以另外的空行结束。 8978.但另外我知道,带有起始数字的块将重复10次然后结束。
因此,以某种方式过滤单个行是可能的,但如何在它们之间使用多个换行符?即使是第一个块,当我不知道何时/如何结束它。还搜索反斜杠。所以,我的方法是使用一个闭合的表达式,只有一个 - 我也可以用于replaceAll()
答案 0 :(得分:1)
第一个正则表达式:
Pattern regex = Pattern.compile(
"^ # Start of line\n" +
"STARTText # Match this text\n" +
".*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
"(?: # Match...\n" +
" ^\\\\ # Start of line, then a backslash\n" +
" .*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
")* # Repeat as needed",
Pattern.MULTILINE | Pattern.COMMENTS);
第二个正则表达式:
Pattern regex = Pattern.compile(
"(?: # Match...\n" +
" ^ # Start of line\n" +
" \\d{4}\\b # Match exactly four digits\n" +
" .*\\r?\\n # Match whatever follows on the line plus (CR)LF\n" +
")+ # Repeat as needed (at least once)",
Pattern.MULTILINE | Pattern.COMMENTS);
答案 1 :(得分:1)
正则表达式1:
/^STARTText.*?(\r?\n)(?:^\\.*?\1)+/m
现场演示: http://www.rubular.com/r/G35kIn3hQ4
正则表达式2:
/^.*?(\r?\n)(?:^\d{4}\s.*?\1)+/m
现场演示: http://www.rubular.com/r/TxFbBP1jLJ
Java中的Regex 1:
(?m)^STARTText.*?(\\r?\\n)(?:^\\\\.*?\\1)+
Java中的Regex 2:
(?m)^.*?(\\r?\\n)(?:^\\d{4}\\s.*?\\1)+
答案 2 :(得分:1)
在这两种情况下,我都使用像(?=^[^\\])
这样的零断言预测来确保下一行继续拥有我正在寻找的东西。
(?=
启动零断言预测,这需要存在的值但不消耗值^[^\\]
匹配一行的开头,后跟任何字符,然后是\
)
关闭断言这将匹配第1部分的所有文本,其中捕获的第一行后跟任意数量的\
行。
^([^\\].*?)(?=^[^\\])
Java Code Example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "STARTFirstText blah, blah
\ 1next line with more text, but the leading backslash
\ 2next line with more text, but the leading backslash
\ 3next line with more text, but the leading backslash
STARTsecondText blah, blah
\ 4next line with more text, but the leading backslash
\ 5next line with more text, but the leading backslash
\ 6next line with more text, but the leading backslash
foo";
Pattern re = Pattern.compile("^([^\\\\].*?)(?=^[^\\\\])",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
$matches Array:
(
[0] => Array
(
[0] => STARTFirstText blah, blah
\ 1next line with more text, but the leading backslash
\ 2next line with more text, but the leading backslash
\ 3next line with more text, but the leading backslash
[1] => STARTsecondText blah, blah
\ 4next line with more text, but the leading backslash
\ 5next line with more text, but the leading backslash
\ 6next line with more text, but the leading backslash
)
[1] => Array
(
[0] => STARTFirstText blah, blah
\ 1next line with more text, but the leading backslash
\ 2next line with more text, but the leading backslash
\ 3next line with more text, but the leading backslash
[1] => STARTsecondText blah, blah
\ 4next line with more text, but the leading backslash
\ 5next line with more text, but the leading backslash
\ 6next line with more text, but the leading backslash
)
)
这将匹配第一行,后跟几行以数字
开头的行^([^\d].*?)(?=^[^\d])
实施例
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text
Second you will see the following links for the items:
2222 leading 4 digits and then some text
3333 leading 4 digits and then some text
4444 leading 4 digits and then some text";
Pattern re = Pattern.compile("^([^\\d].*?)(?=^[^\\d])",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
$matches Array:
(
[0] => Array
(
[0] => First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text
[1] =>
)
[1] => Array
(
[0] => First you will see the following links for the items:
1111 leading 4 digits and then some text
2565 leading 4 digits and then some text
8978 leading 4 digits and then some text
[1] =>
)
)
答案 3 :(得分:0)
对反斜杠使用'\',对一个换行使用'\ r \ n | \ _ \',对4位数使用'\ d {4}':
.*(\r|r\n)
(你的第一个blahblah)
\\.*(\r|r\n)
(你的反斜杠行)
((\d{4}.*(\r|r\n))+(\r|\r\n))+
(你的4个数字块以emtpy行结尾,整个用+重复)