我目前正试图从22kLoC文件中提取300多个函数和子程序,并决定尝试以编程方式进行(我手工完成了“最大”的块)。
考虑表格
的文件declare sub DoStatsTab12( byval shortlga as string)
declare sub DoStatsTab13( byval shortlga as string)
declare sub ZOMFGAnotherSub
Other lines that start with something other than "/^sub \w+/" or "/^end sub/"
sub main
This is the first sub: it should be in the output file mainFunc.txt
end sub
sub test
This is a second sub
it has more lines than the first.
It is supposed to go to testFunc.txt
end sub
Function ConvertFileName(ByVal sTheName As String) As String
This is a function so I should not see it if I am awking subs
But when I alter the awk to chunk out functions, it will go to ConvertFileNameFunc.txt
End Function
sub InitialiseVars(a, b, c)
This sub has some arguments - next step is to parse out its arguments
Code code code;
more code;
' maybe a comment, even?
and some code which is badly indented (original code was written by a guy who didn't believe in structure or documentation)
and
with an arbitrary number of newlines between bits of code because why not?
So anyhow - the output of awk should be everything from sub InitialiseVars to end sub, and should go into InitialiseVarsFunc.txt
end sub
要点:找到以...开头的行集
^sub [subName](subArgs)
结束
^end sub
然后(以下是我的意思):保存将提取的子程序保存到名为[subName]Func.txt
awk
建议自己作为候选人(我过去使用preg_match()
在PHP中编写了文本提取正则表达式查询,但我不想指望有WAMP / LAMP可用性)。
我的出发点是令人愉快的简约(双引号,因为Windows)
awk "/^sub/,/^end sub/" fName
这会找到相关的块(并将它们打印到stdout)。
将输出放到文件中,并在$2
捕获awk
之后命名文件的步骤超出了我的范围。
此过程的早期阶段涉及awk
子程序名称并存储它们:这很容易,因为每个子程序都由表单的单行程声明
declare sub [subName](subArgs)
所以这就是这样,而且完美 -
awk "match($0, /declare sub (\w+)/)
{print substr($3, RSTART, index($3, \"(\")>0 ? index($3, \"(\")-1: RLENGTH)
> substr($3, RSTART, index($3, \"(\")>0 ? index($3, \"(\")-1: RLENGTH)\".txt\"}"
fName
(我试图提出它,以便很容易看出$3
的输出文件名和awk
- 解析到第一个')'如果有的话 - 是同一个东西)。
在我看来,如果输出
awk '/^sub/,/^end sub/' fName
连接成一个数组,然后 $ 2 (在'(')处适当截断将起作用。但它没有。
我查看了处理多行awk
的各种SO(和其他SE系列)线程 - 例如this one和this one,但没有一个给我足够的头关于我的问题(他们帮助获得匹配本身,但没有将它管道到以自己命名的文件)。
我有awk
(和grep
)的RTFD,也无济于事。
答案 0 :(得分:4)
我建议
awk -F '[ (]*' ' # Field separator is space or open paren (for
# parameter lists). * because there may be multiple
# spaces, and parens only appear after the stuff we
# want to extract.
BEGIN { IGNORECASE = 1 } # case-insensitive pattern matching is probably
# a good idea because Basic is case-insensitive.
/^sub/ { # if the current line begins with "sub"
outfile = $2 "Func.bas" # set the output file name
flag = 1 # and the flag to know that output should happen
}
flag == 1 { # if the flag is set
print > outfile # print the line to the outfile
}
/^end sub/ { # when the sub ends,
flag = 0 # unset the flag
}
' foo.bas
请注意,使用简单的模式匹配工具解析源代码容易出错,因为编程语言通常不是常规语言(除了Brainfuck之外的一些例外)。这种事情总是取决于代码的格式。
例如,如果在代码中的某个地方,子声明被分成两行(这可以用_
,我相信,虽然Basic不是我每天都做的事情),试图提取从其定义的第一行开始的子名称是徒劳的。格式化也可以对必要的模式进行微调;在一行开头的多余空间之类的东西需要处理。严格使用此内容进行一次性代码转换并验证它是否产生了所需的结果,不要试图将其作为常规工作流的一部分。
答案 1 :(得分:1)
另一种方式
awk -F'[ (]' 'x+=(/^sub/&&file=$2"Func.txt"){print > file}/^end sub/{x=file=""}' file
awk -F'[ (]' - Set field separator to space or brackets
x+=(/^sub/&&file=$2"Func.txt") - Sets x to 1 if line begins with sub and sets file
to the second field + func.txt. As this is a
condition that is checking if x is true then the
next block will repeatedly be executed until x
is unset.
{print > file} - Whilst x is true print the line into the set filename
/^end sub/{x=file=""} - If line begins with end sub then set both x and file
to nothing.