我们有一个cms系统,可以将html内容块写入sql server数据库。 我知道这些html内容块所在的表名和字段名。 一些html包含链接()到pdf文件。这是一个片段:
<p>A deferred tuition payment plan,
or view the <a href="/uploadedFiles/Tuition-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>
我需要从所有这些html内容块中提取pdf文件名。 最后,我需要一个清单:
Tuition-Reimbursement-Deferred.pdf
Some-other-file.pdf
该字段中的所有pdf文件名。
感谢任何帮助。 感谢。
更新
我收到很多回复,非常感谢你, 但我忘了提到我们仍然在这里使用SQL Server 2000。 所以,这必须使用SQL 2000 SQL来完成。
答案 0 :(得分:3)
创建此功能:
create function dbo.extract_filenames_from_a_tags (@s nvarchar(max))
returns @res table (pdf nvarchar(max)) as
begin
-- assumes there are no single quotes or double quotes in the PDF filename
declare @i int, @j int, @k int, @tmp nvarchar(max);
set @i = charindex(N'.pdf', @s);
while @i > 0
begin
select @tmp = left(@s, @i+3);
select @j = charindex('/', reverse(@tmp)); -- directory delimiter
select @k = charindex('"', reverse(@tmp)); -- start of href
if @j = 0 or (@k > 0 and @k < @j) set @j = @k;
select @k = charindex('''', reverse(@tmp)); -- start of href (single-quote*)
if @j = 0 or (@k > 0 and @k < @j) set @j = @k;
insert @res values (substring(@tmp, len(@tmp)-@j+2, len(@tmp)));
select @s = stuff(@s, 1, @i+4, ''); -- remove up to ".pdf"
set @i = charindex(N'.pdf', @s);
end
return
end
GO
使用该功能的演示:
declare @t table (html varchar(max));
insert @t values
('
<p>A deferred tuition payment plan,
or view the <a href="/uploadedFiles/Tuition-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>'),
('
<p>A deferred tuition payment plan,
or view the <a href="Two files here-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>And I use single quotes
<a href=''/look/path/The second file.pdf''
target="_blank">list</a>');
select t.*, p.pdf
from @t t
cross apply dbo.extract_filenames_from_a_tags(html) p;
<强>结果:
|HTML | PDF |
--------------------------------------------------------------------
|<p>A deferred tui.... | Tuition-Reimbursement-Deferred.pdf |
|<p>A deferred tui.... | Two files here-Reimbursement-Deferred.pdf |
|<p>A deferred tui.... | The second file.pdf |
答案 1 :(得分:1)
嗯它不漂亮,但这可以使用标准的Transact-SQL:
SELECT CASE WHEN CHARINDEX('.pdf', html) > 0
THEN SUBSTRING(
html,
CHARINDEX('.pdf', html) -
PATINDEX(
'%["/]%',
REVERSE(SUBSTRING(html, 0, CHARINDEX('.pdf', html)))) + 1,
PATINDEX(
'%["/]%',
REVERSE(SUBSTRING(html, 0, CHARINDEX('.pdf', html)))) + 3)
ELSE NULL
END AS filename
FROM mytable
如果您愿意,可以在["/]
的文件名之前展开分隔字符列表(其中 引号或斜杠)。
答案 2 :(得分:1)
如何将该HTML视为XML?
declare @t table (html varchar(max));
insert @t
select '
<p>A deferred tuition payment plan,
or view the <a href="/uploadedFiles/Tuition-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>'
union all
select '
<p>A deferred tuition payment plan,
or view the <a href="Two files here-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>And I use single quotes
<a href=''/look/path/The second file.pdf''
target="_blank">list</a>'
select [filename] = reverse(left(reverse('/'+p.n.value('@href', 'varchar(100)')), charindex('/',reverse('/'+p.n.value('@href', 'varchar(100)')), 1) - 1))
from ( select cast(html as xml)
from @t
) x(doc)
cross
apply doc.nodes('//a') p(n);
结果:
filename
---------------------------------------------------------------
Tuition-Reimbursement-Deferred.pdf
Two files here-Reimbursement-Deferred.pdf
The second file.pdf
答案 3 :(得分:1)
试试这个 -
DECLARE @XML XML =
'<p>A deferred tuition payment plan,
or view the <a href="/uploadedFiles/Tuition-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>'
SELECT
ref_text = t.p.value('./a[1]', 'NVARCHAR(50)')
, ref_filename = REVERSE(
LEFT(REVERSE(t.p.value('./a[1]/@href', 'NVARCHAR(50)')),
CHARINDEX('/',REVERSE(t.p.value('./a[1]/@href', 'NVARCHAR(50)')), 1) - 1))
FROM @XML.nodes('/p') t(p)