我需要一些关于re.findall的帮助。我的输入如下:
<?php
if (getenv('environment') == 'production') {
$servername = "localhost";
$username = "production-username";
$password = "production-password";
$dbname = "myDB";
} else {
$servername = "localhost";
$username = "username";
$password = "password";
$dbname = "myDB";
}
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$conn->close();
?>
re.findall的预期输出应该是(只有C ++注释):
# Python 3.4.2
import re
code = b'''
#include "..\..\src.h"\r
/********************************************//**
* ... text
***********************************************/
/*!< Detailed description after the member */
int inx = -1l
const char* = "hello, world";
'''
commonP = rb'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"'
我用以下re.sub检查了它,它可以删除所有注释:
/********************************************//**
* ... text
***********************************************/
/*!< Detailed description after the member */
但如果我将re.sub更改为re.findall:
def comment_remover(text):
def replacer(match):
s = match.group(0)
if s.startswith(b'/'):
return b' ' # note: a space and not an empty string
else:
return s
pattern = re.compile(
commonP,
re.DOTALL | re.MULTILINE
)
return re.sub(pattern, replacer, text)
new_code = comment_remover(code)
print(new_code)
它给了我超出我想要的输出:
print('=' * 100)
L = re.findall(commonP, code, flags = re.DOTALL | re.MULTILINE)
for item in L:
print(item)
我在这里做错了什么?
答案 0 :(得分:1)
您的正则表达式匹配引号括起来的字符串。最后一个替代方案("(?:\\.|[^\\"])*"
)就是这样做的。 https://regex101.com/r/qM1oK5/1
但是,comment_remover
通过检查匹配是否以replacer
开头来处理/
函数中的问题。
因此,您需要修改表达式或过滤re.findall
结果。
In [33]: L = re.findall(commonP, code, flags = re.DOTALL | re.MULTILINE)
In [34]: new_L = [s for s in L if s.startswith('/')]
In [35]: print '\n'.join(new_L)
/********************************************/
/**
* ... text
***********************************************/
/*!< Detailed description after the member */