Question

我想知道为以下字符串编写正则表达式以查找所有函数名称。

"var sampleFunc = function(){return 'hello';}alert(sampleFunc());function sampleTest(){var sampleTestVar = 'one';};var sampleFunc = function(){return 'hello';}alert(sampleFunc());function sampleTest(){var sampleTestVar = 'one';};"

上面的字符串包含简单的JS程序。我想得到上面字符串的输出，

["sampleFunc", "sampleTest", "sampleFunc", "sampleTest"]

帮我写上述问题的正则表达式。

Answer 1

首先，您必须删除可能包含混淆内容的不需要的注释（请参阅下面的示例），然后删除所有新行，最后删除块注释。然后您可以匹配函数名称。有两种类型，一种是使用funcName = function声明的，另一种是使用function funcName声明的。两者都需要不同的正则表达式。

工作代码：

＆＃13;

function getNames(text) {
  text = text.replace(/\/\/.*?\r?\n/g, "")                                 // first, remove line comments
             .replace(/\r?\n/g, " ")                                       // then remove new lines (replace them with spaces to not break the structure)
             .replace(/\/\*.*?\*\//g, "");                                 // then remove block comments
             
  // PART 1: Match functions declared using: var * = function 
  var varFuncs      = (text.match(/[$A-Z_][0-9A-Z_$]*\s*=\s*function[( ]/gi) || []) // match any valid function name that comes before \s*=\s*function
                           .map(function(tex) {                            // then extract only the function names from the matches
                             return tex.match(/^[$A-Z_][0-9A-Z_$]*/i)[0];
                           });

  // PART 2: Match functions declared using: function * 
  var functionFuncs = (text.match(/function\s+[^(]+/g) || [])              // match anything that comes after function and before (
                           .map(function(tex) {                            // then extarct only the names from the matches
                             return tex.match(/[$A-Z_][0-9A-Z_$]*$/i)[0];
                           });
  return {
    var: varFuncs,
    function: functionFuncs
  };
}


var text =
`var sampleFunc = function() {
    return 'hello';
}
/*
  function thisIsNotReallyAFunction() {} 
*/

alert(sampleFunc());
function /* undesired comment */ sampleTest() {
    var sampleTestVar = 'one';
};
var sampleFunc=
// still OK!
function() {
    return 'hello';
}
alert(sampleFunc());
function
// all sotrts of comments
sampleTest()
/* Even
 * Block ones
 */

{
    var sampleTestVar = 'one';
};
var sampleFuncEDIT = function (){};
var functionNameEDIT = "sampleFunc";
`;

var names = getNames(text);
console.log(names);

＆＃13;

备注：

函数名称可以包含使用上述正则表达式[$A-Z_][0-9A-Z_$]*无法匹配的各种其他unicode字符。 ECMA Specs。

即使删除了注释，也可能有其他因素可能会混淆功能（例如字符串）。上面提供了一个简单的用例，如果您正在寻找一种先进的方法，那么您需要解析字符串，而不是使用正则表达式。

以下是一些功能不起作用的例子：

var text = "var dummyString = 'function thisShouldntBeMatchedButWillBe';" var text = "someString = 'this /* will confuse the comment removal'"; // ...

Answer 2

好的，这是另一种方法。在这种更安全可靠的方法中，我使用了acorn，它是CodeMirror TernJS用于解析javascript的库。 CodeMirror是非常强大的基于网络的代码编辑器，几乎用于eveywhere（即使在这里也是如此）。

代码：

首先，这是代码：

HTML：

<script src="path/to/accorn.js"></script>
<script src="path/to/walk.js"></script>

使用Javascript：

function getFunctionNames(codeString) {
    var names = [];
    acorn.walk.simple(acorn.parse(codeString), {
        AssignmentExpression: function(node) {
            if(node.left.type === "Identifier" && (node.right.type === "FunctionExpression" || node.right.type === "ArrowFunctionExpression")) {
                names.push(node.left.name);
            }
        },
        VariableDeclaration: function(node) {
            node.declarations.forEach(function (declaration) {
                if(declaration.init && (declaration.init.type === "FunctionExpression" || declaration.init.type === "ArrowFunctionExpression")) {
                    names.push(declaration.id.name);
                }
            });
        },
        Function: function(node) {
            if(node.id) {
                names.push(node.id.name);
            }
        }
    });
    return names;
}

示例：

＆＃13;
＆＃13;
function getFunctionNames(codeString) { var names = []; acorn.walk.simple(acorn.parse(codeString), { AssignmentExpression: function(node) { if(node.left.type === "Identifier" && (node.right.type === "FunctionExpression" || node.right.type === "ArrowFunctionExpression")) { names.push(node.left.name); } }, VariableDeclaration: function(node) { node.declarations.forEach(function (declaration) { if(declaration.init && (declaration.init.type === "FunctionExpression" || declaration.init.type === "ArrowFunctionExpression")) { names.push(declaration.id.name); } }); }, Function: function(node) { if(node.id) { names.push(node.id.name); } } }); return names; } console.log(getFunctionNames(` var sampleFunc = function() { return 'hello'; } /* function thisIsNotReallyAFunction() {} */ alert(sampleFunc()); function /* undesired comment */ sampleTest() { var sampleTestVar = 'one'; }; var sampleFunc= // still OK! function() { return 'hello'; } alert(sampleFunc()); function // all sotrts of comments sampleTest() /* Even * Block ones */ { var sampleTestVar = 'one'; }; var sampleFuncEDIT; sampleFunEDIT = function (){}; var functionNameEDIT = "sampleFunc"; `));
＆＃13;
<script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/acorn.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/walk.js"></script>
＆＃13;
＆＃13;
＆＃13;

<强>解释

要获得详尽的解释，请查看acorn的github页面here。

acorn被分成一堆源文件，每个文件负责一个特定的工作。我们仅使用acorn.js和walk.js。

acorn.js用于解析。它包含许多用于解析的有用函数，例如acorn.parse()，accorn.parseExpressionAt()，acorn.tokenizer()，...我们只对返回AST的acorn.parse感兴趣（{{ 3}}。这基本上是节点的树结构。节点描述了一个有意义的代码块，它可以是赋值，函数调用，变量声明，......节点将是一个具有属性描述的对象那个代码块。它将有type属性，start（代码块开始的地方），end（它结束的地方），每种类型的节点都有一些仅用于该类型的附加属性。

现在，我们有了AST树，我们可以自己遍历它（它们只是一堆嵌套对象）。或者使用acorn的方式：acorn为我们提供了一种非常强大的行走方式。这些函数位于文件walk.js中。与acorn.js相同，walk.js也包含许多有用的函数，我们只需要walk.simple()。 walk.simple的作用是将树和另一个对象作为参数。树是我们的AST树（由acorn.parse返回），对象是这种形式的对象：

{ [NodeType1]: function(node) { /* node is of type NodeType1 */ }, [NodeType2]: function(node) { /* node is of type NodeType2 */ }, ... }

当walk.simple逐个节点遍历树时，它会检查当前节点的类型是否有函数，如果有，它将调用该函数（将该函数传递给它）本身）并前进到下一个节点，如果不是，它将忽略该节点并继续下一个节点。从我们感兴趣的各种节点类型：

Function：

这是一个基本的正常函数声明，例如：

＆＃13;
＆＃13;
var codeString = ` function f () { }; function someName() { }; () => { };`; acorn.walk.simple(acorn.parse(codeString), { Function: function(node) { console.log(node); } });
＆＃13;
<script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/acorn.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/walk.js"></script>
＆＃13;
＆＃13;
＆＃13;

一些额外的属性对是：id（这是一个标识符节点，用于此函数声明，如果函数没有，则为null）。标识符节点（如果存在）具有name属性，该属性将是我们函数的名称。

VariableDeclaration：

哪个是使用var，let或const的变量声明：

＆＃13;
＆＃13;
var codeString = ` var e, f = function() {}, g = () => {}; `; acorn.walk.simple(acorn.parse(codeString), { VariableDeclaration: function(node) { console.log(node); } });
＆＃13;
<script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/acorn.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/walk.js"></script>
＆＃13;
＆＃13;
＆＃13;

此类型的节点还将具有一些其他属性，例如declarations，它是所有声明的数组（上面的示例显示3：e的一个，f的一个和一个g）。声明也是节点，它们具有额外的id（标识符节点）和init（初始化对象，它是描述我们在初始化时分配给变量的值的节点或null如果它不存在）。我们只对init.type是函数节点（"FunctionExpression"或"ArrowFunctionExpression"）感兴趣。

AssignmentExpression：

使用=的任何赋值（不要与变量初始化混淆）：

＆＃13;
＆＃13;
var codeString = ` someVar = function() { } `; acorn.walk.simple(acorn.parse(codeString), { AssignmentExpression: function(node) { console.log(node); } });
＆＃13;
<script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/acorn.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/acorn/5.2.1/walk.js"></script>
＆＃13;
＆＃13;
＆＃13;

此节点对象将具有额外的left（左侧操作数）和right（右侧操作数）属性，这两个属性都是节点。我们只对left节点是标识符节点且right节点是函数节点感兴趣。

备注：

如果实际代码字符串中包含语法错误，则

acorn.parse会引发错误。因此，您可能希望将其调用包装在try-catch语句中以处理该情况，然后将结果传递给acorn.walk.simple，如果只是没有抛出错误。

如果您不想包含某个类型，只需从对象中删除if并仅提供所需的类型。比如说，您不想包含AssignmentExpression，然后将其从传递给acorn.walk.simple
的对象中删除

您可以为不同类型的函数使用不同的数组。与我的其他答案相同：varFunctions，functionFunction和assignmentFunctions。

我希望这有用且不够实用。

Answer 3

var re=/(\w+)\(\)/g
// var re=/([\w_-]+)\(\)/g 
var rslt = s.match(re)

仍然保持简单。

如何编写正则表达式来查找字符串中的所有函数名？

3 个答案: