python正则表达式解析js文件

时间:2014-04-04 09:22:42

标签: javascript python regex

我有一个js文件,其中包含jsdoc样式的注释:

/****************************************************************************
 Copyright (c) 2010-2012 cocos2d-x.org
 Copyright (c) 2008-2010 Ricardo Quesada
 Copyright (c) 2011      Zynga Inc.

 http://www.cocos2d-x.org
 ****************************************************************************/

cc.g_NumberOfDraws = 0;

//Possible OpenGL projections used by director
/**
 * sets a 2D projection (orthogonal projection)
 * @constant
 * @type Number
 */
cc.DIRECTOR_PROJECTION_2D = 0;

/**
 * sets a 3D projection with a fovy=60, znear=0.5f and zfar=1500.
 * @constant
 * @type Number
 */
cc.DIRECTOR_PROJECTION_3D = 1;

//----------------------------------------------------------------------------------------------------------------------

/**
 * <p>
 * </p>
 * @class
 * @extends cc.Class
 */
cc.Director = cc.Class.extend(/** @lends cc.Director# */{
    //Variables
    _landscape:false,
    _nextDeltaTimeZero:false,
    /**
     * <p>
     * </p>
     */
    popToRootScene:function () {
        // ...
    },

    /**
     * <p>
     * </p>
     * @param {Number} level
     */
    popToSceneStackLevel: function (level) {
        // ...
    }
});

/**
 * returns a shared instance of the director
 * @function
 * @return {cc.Director}
 */
cc.Director.getInstance = function () {
    // ...
};

/**
 * is director first run
 * @type Boolean
 */
cc.firstRun = true;

现在我想使用python regexp从jsdoc comment 中提取所有变量和函数。

对于上面的例子,我要提取的片段是:

第1段:

/**
 * sets a 2D projection (orthogonal projection)
 * @constant
 * @type Number
 */
cc.DIRECTOR_PROJECTION_2D = 0;

第2段:

/**
 * sets a 3D projection with a fovy=60, znear=0.5f and zfar=1500.
 * @constant
 * @type Number
 */
cc.DIRECTOR_PROJECTION_3D = 1;

第3段:

/**
 * <p>
 * </p>
 */
popToRootScene:function () {
    // ...
},

第4段:

/**
 * <p>
 * </p>
 * @param {Number} level
 */
popToSceneStackLevel: function (level) {
    // ...
}

第5段:

/**
 * returns a shared instance of the director
 * @function
 * @return {cc.Director}
 */
cc.Director.getInstance = function () {
    // ...
};

第6段:

/**
 * is director first run
 * @type Boolean
 */
cc.firstRun = true;

正如您所看到的,我想要提取所有变量,实例函数,具有 jsdoc样式注释的类函数,并使列表类似于:

变量:

name: cc.DIRECTOR_PROJECTION_2D   type: number
name: cc.DIRECTOR_PROJECTION_3D   type: number

实例函数:

name: popToRootScene    param: xxxx   return: xxxx
name: popToSceneStackLevel   param: number - level   return: xxxx

类函数:

name: cc.Director.getInstance   param: xxxx   return: cc.Director

我尝试使用以下方法解析文件的类函数:

re.findall('\s*/\*\*.*?\*/.*?function.*?};', content, re.S)

和实例函数:

re.findall('\s*/\*\*.*?\*/.*?function.*?},', content, re.S)

但失败了......

任何建议都将不胜感激,谢谢:)

更新

re.findall(r"(^(?P<identation> *)/\*\*.*$(\r?\n?^(?P=identation) * .*$)*\r?\n?(?P=identation) \*/\r?\n?^.*$)", content, re.M)

该模式效果很好,除非评论之间有空行,如:

/**1
2

3
 */
cc.Node = cc.Class.extend(/** @lends cc.Node# */{

});

1 个答案:

答案 0 :(得分:1)

您无法将括号表达式与正则表达式匹配。他们需要无上下文的表达。

您可以匹配评论后的第一行或任何内容,直到;为止。

for x in re.findall(r"(^(?P<identation> *)/\*\*\s*$(\r?\n?^(?P=identation) * .*$)*\r?\n?(?P=identation) \*/\s*^.*$)", s, re.MULTILINE):
    print("-" * 40)
    print(x[0])


----------------------------------------
/**
 * sets a 2D projection (orthogonal projection)
 * @constant
 * @type Number
 */
cc.DIRECTOR_PROJECTION_2D = 0;
----------------------------------------
/**
 * sets a 3D projection with a fovy=60, znear=0.5f and zfar=1500.
 * @constant
 * @type Number
 */
cc.DIRECTOR_PROJECTION_3D = 1;
----------------------------------------
/**
 * <p>
 * </p>
 * @class
 * @extends cc.Class
 */
cc.Director = cc.Class.extend(/** @lends cc.Director# */{
----------------------------------------
    /**
     * <p>
     * </p>
     */
    popToRootScene:function () {
----------------------------------------
    /**
     * <p>
     * </p>
     * @param {Number} level
     */
    popToSceneStackLevel: function (level) {
----------------------------------------
/**
 * returns a shared instance of the director
 * @function
 * @return {cc.Director}
 */
cc.Director.getInstance = function () {
----------------------------------------
/**
 * is director first run
 * @type Boolean
 */

cc.firstRun = true;

是我能得到的最好的。在最后一行中,您可以看到中间的空行也是受欢迎的。