例如,我想用三重双引号之间的文本解析python文件,并从这个文本制作html表。
文本块,例如
"""
Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #'
Replaces equals operator ('=') with 'BETWEEN # AND #'
Tested against:
* Microsoft SQL Server 2005
* MySQL 4, 5.0 and 5.5
* Oracle 10g
* PostgreSQL 8.3, 8.4, 9.0
Requirement:
* Microsoft Access
Notes:
* Useful to bypass weak and bespoke web application firewalls that
filter the greater than character
* The BETWEEN clause is SQL standard. Hence, this tamper script
should work against all (?) databases
>>> tamper('1 AND A > B--')
'1 AND A NOT BETWEEN 0 AND B--'
>>> tamper('1 AND A = B--')
'1 AND A BETWEEN B AND B--'
"""
Html表必须是简单表包含5列
"""
和\n if new line is empty
Tested against:
和\n if new line is empty
或Requirement:
和\n if new line is empty
Notes:
和\n if new line is empty
>>>
和\n
4 column end
和\n
结果必须是:
Microsoft SQL Server 2005
或
篡改(' 1和A> B - ') 篡改(' 1和A = B - ')
' 1和不在0和B之间 - ' ' 1和B和B之间 - '
我可以使用哪种语法来提取它? 我将使用VBScript.RegExp。
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\path\to\your.py").ReadAll
Set re = New RegExp
re.Pattern = """([^""]*)"""
re.Global = True
For Each m In re.Execute(txt)
WScript.Echo m.SubMatches(0)
Next
答案 0 :(得分:2)
你的问题非常广泛,所以我只是概述了解决这个问题的方法。否则,我必须为你编写整个脚本,这不会发生。
提取docquotes之间的所有内容。使用这样的正则表达式来提取docquotes之间的文本:
Set re1 = New RegExp
re1.Pattern = """""""([\s\S]*?)"""""""
For Each m In re1.Execute(txt)
docstr = m.SubMatches(0)
Next
请注意,如果文件中包含多个docstring,并且希望处理所有文档字符串,则需要将re.Global
设置为True
。否则你只会得到第一场比赛。
使用第二个正则表达式删除前导和尾随空格:
Set re2 = New RegExp
re2.Pattern = "^\s*|\s*$"
re2.Global = True 'find all matches
docstr = re2.Replace(docstr, "")
您不能使用Trim
,因为该函数只处理空格而不处理其他空格。
将字符串拆分为2个连续的换行符以获取文档部分,或使用其他正则表达式来提取它们:
Set re3 = New RegExp
re3.Pattern = "([\s\S]*?)\r\n\r\n" +
"Tested against:\r\n([\s\S]*?)\r\n\r\n" +
...
For Each m In re3.Execute(txt)
descr = m.SubMatches(0)
tested = m.SubMatches(1)
...
Next
继续分解各个部分,直到您有要显示的元素。然后从这些元素构建HTML。