TSQL从HTML中删除具有特定src的img标记

时间:2016-08-01 13:08:30

标签: sql-server regex tsql replace

我的数据库中有一个带有许多img标签的html文本。我的目标是删除具有特定src的img标签

我的输入是

<div>
    <p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
    <p>some other text goes here <img height="100" src='/upload/remove-me.png' width="200" /></p>    
    <p>some other text goes here <img src="/upload/filename.png" /></p>
</div>

我想删除所有图片,其中src =“/ upload / remove-me.png”我的输出结果为

<div>
    <p>some text goes here</p>
    <p>some other text goes here</p>
    <p>some other text goes here <img src="/upload/filename.png" /></p>
</div>

有没有办法在TSQL中使用正则表达式?

4 个答案:

答案 0 :(得分:2)

XML DML提供更优雅的解决方案。很可能主表的HTML字段为(n)varchar(max)),因此需要临时表。

declare @HTML table(id int, a xml) 
insert into @HTML
select id, html
from dbo.myTable
/* content of html field
'<div>
    <p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
    <p>some other text goes here <img height="100" src="/upload/remove-me.png" width="200" /></p>    
    <p>some other text goes here <img src="/upload/filename.png" /></p>
</div>'
*/
update @html
set a.modify('delete //img[contains(@src,"remove-me")]') --delete nodes and update
from @HTML cross apply a.nodes('div') t(v)

--select * from @html --just to see what happens
update dbo.myTable
set html = h.a
from dbo.myTable t
inner join @html h on t.id = h.id

答案 1 :(得分:1)

从您的示例看,标签似乎可以按任何顺序拥有其属性,因此我们需要遍历文本以一次取出一个img标签。显然,您需要在备份的数据版本上尝试此操作,以确保它只删除您要删除的内容:

declare @HTML table(a nvarchar(max)) 
insert into @HTML
select 
'<div>
    <p>some text goes here <img width="100" src="/upload/remove-me.png" /></p>
    <p>some other text goes here <img height="100" src="/upload/remove-me.png" width="200" /></p>    
    <p>some other text goes here <img src="/upload/filename.png" /></p>
</div>'


declare @URL nvarchar(50) = 'src="/upload/remove-me.png"'   -- Search for img tags with this text in.
declare @TagStart int = -1
declare @TagEnd int = -1

while @TagStart <> 0
begin
    select @TagStart = patindex('%<img%' + @URL + '%/>%',a)-1       -- Find the start of the first img tag in the text.
            ,@TagEnd = patindex('%/>%'
                                        ,substring(a
                                        ,patindex('%<img%' + @URL + '%/>%',a)
                                        ,999999999
                                        )
                                )+1                                 -- Find the end of the first img tag in the text.
    from @HTML

    update @HTML                -- Update the table to remove just this tag
    set a = (select left(a,@TagStart) + right(a,len(a)-@TagStart-@TagEnd)
            from @HTML
            )

    select @TagStart = patindex('%<img%' + @URL + '%/>%',a)     -- Check if there are any more img tags with the URL to remove.  Will return 0 if there are none.
    from @HTML
end

select a as CleanHTML
from @HTML

答案 2 :(得分:0)

如果img整体不变(不仅仅是src):

<img height="100" src='/upload/remove-me.png' width="200" />

然后您可以使用简单的REPLACE,如下所示:

UPDATE tablename SET columnname=REPLACE(
  columnname,
  N' <img height="100" src=''/upload/remove-me.png'' width="200" />',
  N''
)
WHERE columnname LIKE N'% <img height="100" src=''/upload/remove-me.png'' width="200" />%'

标记之前的空格。 如果标记存储在ntext列中,请先转换为nvarchar(max),否则REPLACE将失败。

如果这是一次性数据更正以外的任务,您应该将其包含在业务逻辑层中。

答案 3 :(得分:0)

以下功能应该可以胜任。它只是找到目标图像名称的图像开始和结束标记,然后删除文本。

var playersNames = ["name1", "name2", "name3"];
var player = [];
function Player() {
	
}

for(i=0; i < playersNames.length; i++){
   player.push(new Player());
   player[i].name = playersNames[i];
}
            
console.log(player);

以下示例显示了如何使用它:

ALTER FUNCTION Html_RemoveImageAttributes
(
    @sourceImage        NVARCHAR(100),
    @inputHtml          NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN

    DECLARE @imageTagStart INT = CHARINDEX('<img ' , @inputHtml, 1);
    DECLARE @imageIndex INT = CHARINDEX(@sourceImage, @inputHtml, @imageTagStart);
    DECLARE @imageTagEnd INT = CHARINDEX('/>' , @inputHtml, @imageTagStart);

    DECLARE @outputHtml NVARCHAR(MAX) = @inputHtml;

    WHILE (@imageIndex > 0) 
    BEGIN

        IF (@imageIndex > @imageTagStart) AND (@imageIndex < @imageTagEnd)
        BEGIN

            -- Remove first occurrence of image.
            SET @outputHtml = REPLACE(@outputHtml, SUBSTRING(@outputHtml, @imageTagStart, @imageTagEnd - @imageTagStart + 2), '');

            SET @imageTagStart  = CHARINDEX('<img ' , @outputHtml);
            SET @imageIndex  = CHARINDEX(@sourceImage, @outputHtml);
            SET @imageTagEnd  = CHARINDEX('/>' , @outputHtml);
        END
        ELSE
        BEGIN

            SET @imageTagStart  = CHARINDEX('<img ' , @outputHtml, @imageTagEnd);
            SET @imageIndex  = CHARINDEX(@sourceImage, @outputHtml, @imageTagEnd);
            SET @imageTagEnd  = CHARINDEX('/>' , @outputHtml, @imageTagEnd + 1);

        END

    END


    RETURN @outputHtml

END