Question

所以我在这个python脚本中做错了什么，但是它变得令人费解，我正在忽视我做错了什么。

我希望脚本遍历文件，找到所有函数定义，然后提取函数的名称，返回类型和参数，并输出“doxygen”样式注释，如下所示：

/******************************************************************************/
  /*!
    \brief
      Main function for the file

    \return
      The exit code for the program
  */
/******************************************************************************/

但是我在试图解析参数时正常做了一些错误...到目前为止这是脚本：

import re
import sys

f = open(sys.argv[1])

functions = []

for line in f:
  match = re.search(r'([\w]+)\s+([\S]+)\(([\w+\s+\w+])+\)',line)
  if line.find("\\fn") < 0:
    if match:
      returntype = match.group(1)
      funcname = match.group(2)
      print '/********************************************************************'
      print "  \\fn " + match.group()
      print ''
      print '  \\brief'
      print '    Function description for ' + funcname
      print ''
      if len(match.groups()) > 2:
        params = []
        count = len(match.groups()) - 2
        while count > 0:
          matchingstring = match.group(count + 2)
          if matchingstring.find("void") < 0:
            params.append(matchingstring)
          count -= 1
        for parameter in params:
          print "  \\param " + parameter
          print '    Description of ' + parameter
          print ''
      print '  \\return'
      print '    ' + returntype
      print '********************************************************************/'
      print ''

任何帮助将不胜感激。感谢

Answer 1

C ++的语法很难处理，简单易懂常用表达。你至少需要一个最小的解析器。我发现，对于受限制的案件，我并不担心通常使用C ++，但只有我自己的风格，我经常可以逃脱使用基于flex的标记器和简单的状态机。这个在很多合法的C ++案例中都会失败 - 对于初学者而言当然，如果有人使用预处理器来修改语法; 但也因为<可能有不同的含义，取决于什么先于它命名模板或不。但它经常发生适合特定的工作。

Answer 2

在尝试进行简单的格式分析时，我使用了PEG解析器并取得了巨大的成功。 pyPeg是用Python编写的这种解析器的一个非常简单的实现。

C ++函数解析器的Python代码示例：

编辑：地址模板参数。使用SK逻辑输入测试并输出正确。

import pyPEG
from pyPEG import parseLine
import re

def symbol(): return re.compile(r"[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&*][\w:]+")
def type(): return symbol
def functionName(): return symbol
def templatedType(): return symbol, "<", -1, [templatedType, symbol, ","], ">"
def parameter(): return [templatedType, type], symbol
def template(): return "<", -1, [symbol, template], ">"
def function(): return [type, templatedType], functionName, -1, template, "(", -1, [",", parameter], ")" # -1 -> zero or more repetitions.


sourceCode = "std::string foobar(std::vector<int> &A, std::map<std::string, std::vector<std::string> > &B)"
results = parseLine(sourceCode, function(), [], packrat=True)

执行此操作时，结果为：

([(u'type', [(u'symbol', 'std::string')]), (u'functionName', [(u'symbol', 'foobar')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'int')]), (u'symbol', '&A')]), (u'parameter', [(u'templatedType', [(u'symbol', 'std::map'), (u'symbol', 'std::string'), (u'templatedType', [(u'symbol', 'std::vector'), (u'symbol', 'std::string')])]), (u'symbol', '&B')])], '')

Answer 3

C ++无法真正被一个（理智的）正则表达式解析：一旦涉及嵌套，它们就是一场噩梦。

还有另一个问题，即确定何时解析以及何时不解析。可以声明一个函数：

在档案范围
在命名空间
在课堂上

最后两个可以嵌套在任意深度。

我建议在这里使用CLang。它是一个真正的C ++前端，具有全功能的解析器，并且有：

一个C API，带有（特别是）索引库的API
基于C API的Python绑定

C API和Python绑定远未完全暴露底层C ++模型，但对于像列出函数一样简单的任务应该足够了。

那就是说，我会质疑项目的用处：如果文档可以由一个简单的解析器生成，那么它对代码是多余的。冗余充其量只是无用且最危险的：它引入了潜在的去同步威胁......

如果该功能足够棘手以至于其使用需要文档，那么了解其局限性的开发人员必须编写此文档。

在python中使用正则表达式来确定C ++函数及其参数

3 个答案: