仅当文本包含之前的内容时,正则表达式匹配

时间:2018-03-12 16:31:33

标签: java regex regex-lookarounds

给出以下文字

Z:\Clients\xxx\2. RAW Export\2nd Road Loading +\"2nd (1).fls" "2nd (10).fls" "2nd (11).fls" "2nd (12).fls" "2nd (13).fls" "2nd (14).fls" "2nd (15).fls" "2nd (16).fls" "2nd (17).fls" "2nd (18).fls" "2nd (19).fls" "2nd (2).fls" "2nd (3).fls" "2nd (4).fls" "2nd (5).fls" "2nd (6).fls" "2nd (7).fls" "2nd (8).fls" "2nd (9).fls" "new project (2) Support" "new project (2).rcp" "new project (3) Support" "new project (3).rcp" "new project Support" "new project.rcp"

我希望与" import os def sysPathCreator(rootFolder): #rootFolder = ("C:\\Users\\ALS_Surveying\\Desktop\\test folder") # Creates a list of all subfolders subFolderList2 = os.listdir(rootFolder) # Deletes any elements from a list if they are a file not a folder subFolderList = [x for x in subFolderList2 if "." not in x] # This create the full path to each subfolder subFolderList2 = [] for a in subFolderList: temp = os.path.join(rootFolder,a) subFolderList2.append(temp) # How many subfolders in root totalSubFolders = len(subFolderList) # Creates a List holding a single value for all contents in each subfolder fileList = [] stepper = 0 for fl in range(totalSubFolders): fileList.append(os.listdir(subFolderList2[stepper])) stepper = stepper + 1 # Create the final list holding the full path, root to files filePathList = [] stepper = 0 for final in range(totalSubFolders): subFolderList2Var = str(subFolderList2[stepper]) fileListVar = str(fileList[stepper]) temp2 = os.path.join(subFolderList2Var,fileListVar) stepper = stepper + 1 filePathList.append(temp2) # Remove values from strings within list (Clean up) for i, v in enumerate(filePathList) : filePathList[i] = v.replace(",","") for i, v in enumerate(filePathList) : filePathList[i] = v.replace("[","") for i, v in enumerate(filePathList) : filePathList[i] = v.replace("]","") for i, v in enumerate(filePathList) : filePathList[i] = v.replace("'","\"") # This is supposed to have three """ # Print for abc in filePathList: print(abc) print() print(" Paths created - Successfully".rjust(35,"*")) #Return List return(filePathList) #rootFolder = ("C:\\Users\\ALS_Surveying\\Desktop\\test folder") #sysPathCreator(rootFolder) "," KEYWORD This is a test We want to match the following groups 1:YES 2:YES 3:YES "匹配和" 1:YES"使用

2:YES

当且仅当完整文字中的第一个单词是" 3:YES"

鉴于此测试:

((\d):YES)

不应找到匹配项

2 个答案:

答案 0 :(得分:3)

Java(与大多数正则表达式引擎一样)并不支持无限长度的外观,但是有一种解决方法!

String str = "KEYWORD This is a test\n" +
        "We want to match the following groups 1:YES 2:YES 3:YES";
Matcher matcher = Pattern.compile("(?s)(?<=\\AKEYWORD\\b.{1,99999})(\\d+:YES)").matcher(str);
while (matcher.find()) {
    System.out.println(matcher.group(1));
}

哪个输出:

1:YES
2:YES
3:YES

这里的诀窍是(?<=\\AKEYWORD.{1,99999})背后的外观,它有一个很大(但不是无限制)的长度。 (?s)表示DOTALL标志(点也匹配换行符号),\A表示需要开始输入,因为^匹配使用DOTALL标志时的行首。

答案 1 :(得分:1)

如果不使用Java中的欺骗行为,您可以使用\d+:YES\b捕获\G个字符串。 \G会导致匹配从上一个匹配结束的位置开始,或者匹配字符串的开头与\A相同。

我们需要它的第一个能力:

(?:\AKEYWORD|\G(?!\A))[\s\S]*?(\d:YES\b)

故障:

  • (?:开始非捕获组
    • \A匹配主题字符串的开头
    • KEYWORD匹配关键字
    • |
    • \G(?!\A)从上一场比赛结束前继续
  • ) NCG结束
  • [\s\S]*?非贪婪地匹配任何其他内容
  • (\d+:YES\b)匹配并捕获我们想要的部分

Live demo

Java代码:

Pattern p = Pattern.compile("(?:\\AKEYWORD|\\G(?!\\A))[\\s\\S]*?(\\d+:YES\\b)");
Matcher m = p.matcher(string);                                   
while (m.find()) {
    System.out.println(m.group(1));
}

Live demo