我想解析包含由#
字符引入的单行注释的KConf文件。您可以在下面找到此类文件的示例。
https://github.com/torvalds/linux/blob/master/arch/x86/Kconfig
我知道单行测试字符串看起来几乎是随机的,尽管它应该包含大多数(如果不是全部)嵌套哈希和字符串的变体以及注释中不引入字符串的引号。
我目前使用的正则表达式引擎是基于Java的Groovy中的那个。
测试字符串
Lorem "ipsum # \" dolor" sit amet, 'consectetur # \' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.
期望的结果
Lorem "ipsum # \" dolor" sit amet, 'consectetur # \' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non
或(带有前导空格)
#bibendum 'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.
答案 0 :(得分:1)
首先,我已经转义了你的字符串,因此它可以使用JavaScript存储为变量(因为你似乎没有表示语言,我会假设JS):
var str = 'Lorem "ipsum # " dolor" sit amet, \'consectetur # \' adipiscing\' elit. Maecenas \'suscipit#mollis\' quam, non #bibendum \'elit # eleifend "in. Duis # convallis" luctus nunc, ac luctus lectus dapibus at.';
要删除“”后跟“#”后面的所有内容,<#>不是后跟一个空格:
str.replace(/ #[^ ].*/, '');
最后,你的第二个预期结果完全没有意义。
所有这一切当然都会得到适当的描述。
答案 1 :(得分:0)
根据有限的信息,这个正则表达式可能会起作用 尽管如此,试图区分嵌入式哈希与冥想似乎有点复杂 没有时间测试它,但切了几个正则表达式 请注意,它应该在多线模式中使用。而且一切都适合线条解析 即正则表达式中的任何内容都不会跨越行。
# (?-s)^(?:"[^"\\\n]*(?:\\.[^"\\\n]*)*"|'[^'\\\n]*(?:\\.[^'\\\n]*)*'|[^#"'\s]+|(?<=[^\s#])\#+|[^\S\n]+(?!\#))*(?:[^\S\n]+|^)(\#.*)$
# "(?-s)^(?:\"[^\"\\\\\\n]*(?:\\\\.[^\"\\\\\\n]*)*\"|'[^'\\\\\\n]*(?:\\\\.[^'\\\\\\n]*)*'|[^#\"'\\s]+|(?<=[^\\s#])\\#+|[^\\S\\n]+(?!\\#))*(?:[^\\S\\n]+|^)(\\#.*)$"
(?-s) # Modifier, No dot all
^ # Beginning of line
(?:
" # Double quotes
[^"\\\n]*
(?: \\ . [^"\\\n]* )*
"
| # or
' # Single quotes
[^'\\\n]*
(?: \\ . [^'\\\n]* )*
'
| # or
[^#"'\s]+ # Not hash, quotes, whitespace
| # or
(?<= [^\s#] ) # Preceded by a character, but not hash or whitespace
\#+ # Embeded hashes
| # or
[^\S\n]+ # Whitespaces (non-newline)
(?! \# ) # Not folowed by hash
)*
(?: [^\S\n]+ | ^ ) # Whitespaces (non-newline) or BOL
( \# .* ) # (1), hash comment
$ # End of line
答案 2 :(得分:0)
原始正则表达式:
^((?:\\.|("|')(?:(?!\2|\\|[\r\n]).|\\.)*\2|[^#'"\r\n])+)#.+
替换为$1
:
示例:
String re = "^((?:\\\\.|(\"|')(?:(?!\\2|\\\\|[\\r\\n]).|\\\\.)*\\2|[^#'\"\\r\\n])+)#.+";
String line = "Lorem \"ipsum # \\\" dolor\" sit amet, 'consectetur # \\' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non #bibendum 'elit # eleifend \"in. Duis # convallis\" luctus nunc, ac luctus lectus dapibus at.";
String uncommented = line.replaceAll(re, "$1");
//=> Lorem "ipsum # \" dolor" sit amet, 'consectetur # \' adipiscing' elit. Maecenas 'suscipit #mollis' quam, non
故障:
^ # Beginning of line
( # Beginning of 1st capture group
(?: # Non-capture group 1
\\. # Match an escaped character
|
("|') # Or, a quote (and capture it in 2nd capture group),
(?: # Non-capture group 2
(?!\2|\\|[\r\n]). # Followed by any character except relevant quote, \ or newline
|
\\. # Or an escaped character
)* # Close of non-capture group 2 and repeat as many times
\2 # Close the quoted part
|
[^#'"\r\n] # Any non-hash, single/double quote, newline characters
)+ # Close of non-capture group 1 and repeat as many times
) # Close capture group 1
#.+ # Match comments