我在数据中有以下HTML内容:
outer text <span class="cssname">inner text to be removed along with tags</span> further text
我想在查询中使用正则表达式删除所有特定标记以及内部文本<span with class='cssname'
。
我喜欢的预期输出是:
'outer text further text'
答案 0 :(得分:0)
SQL Server中不像其他语言那样完全支持正则表达式。这适用于单个标签。
$ cat tst.awk
BEGIN { split("10-1 15 17",tmp); for (i in tmp) goodVals[tmp[i]] }
$2 != prevPivot { prtCurrSet() }
{ seen[$9]; currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet( val,allGoodPresent,someBadPresent) {
allGoodPresent = 1
for (val in goodVals) {
if ( !(val in seen) ) {
allGoodPresent = 0
}
delete seen[val]
}
someBadPresent = length(seen)
if ( allGoodPresent && !someBadPresent ) {
printf "%s", currSet
}
currSet = ""
delete seen
}
$ awk -f tst.awk file
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
答案 1 :(得分:0)
这样可以调整HTML以从常规文本中创建<content>
元素,并将结果转换为XML。这是在CROSS APPLY
部分完成的。
第二步使用XQuery查询<content>
元素中的文本(从而剥离<span>
元素)。
DECLARE @tt TABLE(t NVARCHAR(MAX));
INSERT INTO @tt(t)VALUES(N'outer text <span class="cssname">inner text to be removed along with tags</span> further text');
SELECT
stripped=CAST(x.query('for $i in (/content) return $i/text()') AS NVARCHAR(MAX))
FROM
@tt
CROSS APPLY (
SELECT
x=CAST('<content>'+REPLACE(REPLACE(t,'<span','</content><span'),'/span>','/span><content>')+'</content>' AS XML)
) AS f
结果:
outer text further text