Question

使用如下的pyparsing可以实现相反的目的：

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
removeText = replaceWith("")
scriptOpen, scriptClose = makeHTMLTags("script")
scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose
scriptBody.setParseAction(removeText)
data = (scriptBody).transformString(data)

如何保留标记"table"的内容？

更新0：

我试过了：＃只保留表格 tableOpen，tableClose = makeHTMLTags（“table”） tableBody = tableOpen + SkipTo（tableClose）+ tableClose f = replaceWith（tableBody） tableBody.setParseAction（F） data =（tableBody）.transformString（data）打印数据

我得到这样的东西......

garbages
<input type="hidden" name="cassstx"   value="en_US:frontend"></form></td></tr></table></span></td></tr></table> 

{<"table"> SkipTo:(</"table">) </"table">} 
<div id="asbnav" style="padding-bottom: 10px;">{<"table"> SkipTo:(</"table">) </"table">} 
</div> 
even more garbages

更新2：

谢谢Martelli。我需要的是：

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
data = 'before<script>ciao<table>buh</table>bye</script>after'

tableOpen, tableClose = makeHTMLTags("table")
tableBody = tableOpen + SkipTo(tableClose) + tableClose
thetable = (tableBody).searchString(data)[0][2]

print thetable

Answer 1

您可以先提取表格（类似于您现在提取脚本的方式，但当然没有删除;-)，获取thetable字符串;然后，您提取脚本replaceWith(thetable)而不是replaceWith('')。或者，您可以准备一个更精细的解析操作，但简单的两阶段方法对我来说更直接。例如。（专门保留table的内容，而不是table 标记）：

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
data = 'before<script>ciao<table>buh</table>bye</script>after'

tableOpen, tableClose = makeHTMLTags("table")
tableBody = tableOpen + SkipTo(tableClose) + tableClose
thetable = (tableBody).searchString(data)[0][2]

removeText = replaceWith(thetable)
scriptOpen, scriptClose = makeHTMLTags("script")
scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose
scriptBody.setParseAction(removeText)
data = (scriptBody).transformString(data)

print data

这会打印beforebuhafter（脚本标记之外的内容，表格标签的内容夹在里面），希望“按照需要”。

除了标签内容之外的正文

1 个答案: