需要以下帮助。 如果我想捕获特定标签的值,该如何处理?例如
我想从突出显示的标记中捕获值(800.422.2762(美国和加拿大))。
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
简而言之,我想对这个标记进行硬编码以在每次程序运行时捕获其基础值。
示例XML:
<?xml version="1.0" encoding="UTF-8"?>
<pdf2xml producer="popple`enter code here`r" version="0.51.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
<fontspec id="0" size="27" family="Helvetica" color="#000000"/>
<fontspec id="1" size="9" family="Helvetica" color="#000000"/>
<fontspec id="2" size="9" family="Helvetica" color="#000000"/>
<fontspec id="3" size="9" family="Times" color="#000000"/>
<fontspec id="4" size="12" family="Helvetica" color="#000000"/>
<fontspec id="5" size="12" family="Helvetica" color="#000000"/>
<fontspec id="6" size="9" family="Helvetica" color="#000000"/>
<image top="27" left="54" width="203" height="108" src="ext-resources\bin\asdf-1_1.jpg"/>
<text top="103" left="346" width="123" height="28" font="0"><b>INVOICE</b></text>
<text top="75" left="611" width="211" height="11" font="1">+1 913.217.6000, Fax +1 913.341.3742</text>
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
<text top="102" left="611" width="230" height="11" font="1">headquarters@armaintl.org, www.arma.org</text>
<text top="32" left="611" width="104" height="11" font="1">ARMA International</text>
</page>
</pdf2xml>
到目前为止,我已经尝试了以下方法。 我成功提取了数据,但是我想基于硬编码标签提取特定的值。请帮助使用该方法。
WITH data
AS (SELECT xmltype (
'<?xml version="1.0" encoding="UTF-8"?>
<pdf2xml producer="popple`enter code here`r" version="0.51.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
<fontspec id="0" size="27" family="Helvetica" color="#000000"/>
<fontspec id="1" size="9" family="Helvetica" color="#000000"/>
<fontspec id="2" size="9" family="Helvetica" color="#000000"/>
<fontspec id="3" size="9" family="Times" color="#000000"/>
<fontspec id="4" size="12" family="Helvetica" color="#000000"/>
<fontspec id="5" size="12" family="Helvetica" color="#000000"/>
<fontspec id="6" size="9" family="Helvetica" color="#000000"/>
<image top="27" left="54" width="203" height="108" src="ext-resources\bin\asdf-1_1.jpg"/>
<text top="103" left="346" width="123" height="28" font="0"><b>INVOICE</b></text>
<text top="75" left="611" width="211" height="11" font="1">+1 913.217.6000, Fax +1 913.341.3742</text>
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
<text top="102" left="611" width="230" height="11" font="1">headquarters@armaintl.org, www.arma.org</text>
<text top="32" left="611" width="104" height="11" font="1">ARMA International</text>
</page>
</pdf2xml>')
xmldoc
FROM DUAL)
SELECT x.*
FROM data,
XMLTABLE ('/pdf2xml/page/text'
PASSING xmldoc
COLUMNS text VARCHAR2 (50) PATH '/text') x
/
输出:
TEXT
--------------------------------------------------
INVOICE
+1 913.217.6000, Fax +1 913.341.3742
800.422.2762 (U.S. and Canada)
headquarters@armaintl.org, www.arma.org
ARMA International
答案 0 :(得分:3)
只需从
更改XQuery'/pdf2xml/page/text'
到
'/pdf2xml/page/text[@top=89]'
结果将是
800.422.2762 (U.S. and Canada)
答案 1 :(得分:0)
或将查询更改为:
SELECT x.*
FROM data,
XMLTABLE ('/pdf2xml/page/text'
PASSING xmldoc
COLUMNS
text VARCHAR2 (50) PATH '/text',
top number PATH '@top',
left number PATH '@left',
width number PATH '@width',
height number PATH '@height',
font number PATH '@font'
) x
where x.top = 89
and x.left = 611
and x.width = 177
and x.height = 11
and x.font = 1;
答案 2 :(得分:0)
如果您只有一个源文档,并且只需要一个节点值,则可以使用XMLQuery而不是XMLTable,在@wolφi的XPath上略有不同:
select XMLQuery('/pdf2xml/page/text[@top=89]/text()'
passing xmldoc
returning content) as text
from data;
为您提供XML片段,或者
select XMLQuery('/pdf2xml/page/text[@top=89]/text()'
passing xmldoc
returning content).getStringVal() as text
from data;
为您提供一个字符串:
TEXT
------------------------------
800.422.2762 (U.S. and Canada)
当然,如果确实有多个文档或节点,XMLTable是必经之路。