注释代码中的Python Regex

时间:2016-11-17 21:08:34

标签: python regex

我试图在大多数文件开头的注释掉代码中匹配开源许可证类型。但是,在所需字符串(例如较小的通用公共许可证)跨越两行的情况下,我遇到了困难。例如,请参阅许可证下面的代码。

 * Copyright (c) Codice Foundation
 * <p/>
 * This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser
 * General Public License as published by the Free Software Foundation, either version 3 of the
 * License, or any later version.
 * <p/>
 * This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
 * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * Lesser General Public License for more details. A copy of the GNU Lesser General Public License
 * is distributed along with this program and can be found at
 * <http://www.gnu.org/licenses/lgpl.html>.
 */

由于注释代码中的空格数量未知以及不同语言中的注释字符不同,因此无法使用正则表达式回溯。我当前的正则表达式表达式的示例如下:

self._cr_license_re['GNU']                            = re.compile('\sGNU\D')
self._cr_license_re['MIT License']                    = re.compile('MIT License|Licensed MIT|\sMIT\D')
self._cr_license_re['OpenSceneGraph Public License']  = re.compile('OpenSceneGraph Public License', re.IGNORECASE)
self._cr_license_re['Artistic License']               = re.compile('Artistic License', re.IGNORECASE)
self._cr_license_re['LGPL']                           = re.compile('\sLGPL\s|Lesser General Public License', re.IGNORECASE)
self._cr_license_re['BSD']                            = re.compile('\sBSD\D')
self._cr_license_re['Unspecified OS']                 = re.compile('free of charge', re.IGNORECASE)
self._cr_license_re['GPL']                            = re.compile('\sGPL\D|(?<!Lesser)\sGeneral Public License', re.IGNORECASE)
self._cr_license_re['Apache License']                 = re.compile('Apache License', re.IGNORECASE)
self._cr_license_re['Creative Commons']               = re.compile('\sCC\D')

我欢迎任何有关如何在python中使用正则表达式解决此问题的建议。

1 个答案:

答案 0 :(得分:1)

您可以使用this regex并替换为空格

\s*\*\s*\/?

这应该将多行评论放在一行,然后你可以在其中找到许可证。