我正在创建一个powershell脚本来解析包含C代码的文件,并检测它是否包含对 free(), malloc()或 realloc()函数。
int MethodOne()
{
return 1;
}
int MethodTwo()
{
free();
return 1;
}
int MethodOne()
{
//free();
return 1;
}
int MethodTwo()
{
free();
return 1;
}
$regex = "(^[^/]*free\()|(^[^/]*malloc\()|(^[^/]*realloc\()"
$file_one= "Z:\PATH\file_one.txt"
$file_two= "Z:\PATH\file_two.txt"
$contentOne = Get-Content $file_one -Raw
$contentOne -match $regex
$contentTwo = Get-Content $file_two-Raw
$contentTwo -match $regex
在一段时间内处理整个文件似乎与 contentOne 一起使用,
实际上我得到 True (因为MethodTwo中的free())。
处理 contentTwo 并不是那么幸运,并返回False而不是True
(因为MethodTwo中的free())。
有人可以帮我写一个更好的正则表达式,在两种情况下都有效吗?
答案 0 :(得分:1)
当然,这就是它
原始:
^(?>(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n))|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\b(?:free|malloc|realloc)\()[\S\s](?:(?!\b(?:free|malloc|realloc)\()[^/"'\\])*))*(?:(\bfree\()|(\bmalloc\()|(\brealloc\())
Stringed:
"^(?>(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\(?:\\r?\\n)?)*?(?:\\r?\\n))|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(?!\\b(?:free|malloc|realloc)\\()[\\S\\s](?:(?!\\b(?:free|malloc|realloc)\\()[^/\"'\\\\])*))*(?:(\\bfree\\()|(\\bmalloc\\()|(\\brealloc\\())"
逐字:
@"^(?>(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n))|(?:""[^""\\]*(?:\\[\S\s][^""\\]*)*""|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\b(?:free|malloc|realloc)\()[\S\s](?:(?!\b(?:free|malloc|realloc)\()[^/""'\\])*))*(?:(\bfree\()|(\bmalloc\()|(\brealloc\())"
解释
^
(?>
(?: # Comments
/\* # Start /* .. */ comment
[^*]* \*+
(?: [^/*] [^*]* \*+ )*
/ # End /* .. */ comment
|
// # Start // comment
(?: # Possible line-continuation
[^\\]
| \\
(?: \r? \n )?
)*?
(?: \r? \n ) # End // comment
)
| # OR,
(?: # Non - comments
"
[^"\\]* # Double quoted text
(?: \\ [\S\s] [^"\\]* )*
"
| '
[^'\\]* # Single quoted text
(?: \\ [\S\s] [^'\\]* )*
'
| # OR,
(?! # ASSERT: Here, cannot be free / malloc / realloc {}
\b
(?: free | malloc | realloc )
\(
)
[\S\s] # Any char which could start a comment, string, etc..
# (Technically, we're going past a C++ source code error)
(?: # -------------------------
(?! # ASSERT: Here, cannot be free / malloc / realloc {}
\b
(?: free | malloc | realloc )
\(
)
[^/"'\\] # Char which doesn't start a comment, string, escape,
# or line continuation (escape + newline)
)* # -------------------------
) # Done Non - comments
)*
(?:
( \b free\( ) # (1), Free()
|
( \b malloc\( ) # (2), Malloc()
|
( \b realloc\( ) # (3), Realloc()
)
一些注意事项:
这只能使用^
锚从字符串的开头找到第一个
要全部找到它们,只需从正则表达式中删除^
即可。
这是有效的,因为它可以匹配您所寻找的所有内容 在这种情况下,它发现的是捕获组1,2或3.
祝你好运!!
正则表达式包含什么:
----------------------------------
* Format Metrics
----------------------------------
Atomic Groups = 1
Cluster Groups = 10
Capture Groups = 3
Assertions = 2
( ? ! = 2
Free Comments = 25
Character Classes = 12
修改强>
根据请求,解释处理
的正则表达式部分
/**/
评论。这个 - > /\*[^*]*\*+(?:[^/*][^*]*\*+)*/
这是一个修改的展开循环正则表达式,它采用开始分隔符
/*
的{{1}}和*/
的结尾
请注意,打开/关闭在其分隔符中共享一个共同字符/
序列。
为了能够在没有环绕断言的情况下执行此操作,使用了一种方法
将尾随分隔符的星号移到循环内。
使用此分解,所需的全部内容是检查结束/
完成分隔序列。
/\* # Opening delimiter /*
[^*]* # Optionally, consume all non-asterisks
\*+ # This must be 1 or more asterisks anchor's or FAIL.
# This is matched here to align the optional loop below
# because it is looking for the closing /.
(?: # The optional loop part
[^/*] # Specifically a single non / character (nor asterisk).
# Since a / will be the next closing delimiter, it must be excluded.
[^*]* # Optional non-asterisks.
# This will accept a / because it is supposed to consume ALL
# opening delimiter's as it goes
# and will consider the very next */ as a close.
\*+ # This must be 1 or more asterisks anchor's or FAIL.
)* # Repeat 0 to many times.
/ # Closing delimiter /