我有一个名为CRS.CRS_FILES的oracle数据库表,其中有一个名为FILE_DATA的列-其中CLOB列是一个大型XML字符串。
FILE_DATA FILE_CREATION_DATE
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/6/2019
<?xml version="1.0" encoding="utf-8"?><REPORT 1/1/2020
<?xml version="1.0" encoding="utf-8"?><REPORT 1/5/2020
以下是其中的前几行:
<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
使用以下我要查询的Xpath进行设置:
//REPORT/AGENCYIDENTIFIER
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getClobVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
query_string2 <- "SELECT
XMLTYPE(t.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/text()').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
我不确定我在做什么-我知道sqlQuery在传递SQL查询时存在一些较小的格式问题,但是无论我如何尝试,我的结果都将如下所示:
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
我在做什么错?我只想提取密尔沃基警察局的价值(见下文)(当然,我会将col重命名为AGENCYNAME之类的名称)
XMLTYPE(T.FILE_DATA).EXTRACT('//REPORT/AGENCYNAME/TEXT()').GETCLOBVAL()
1 Milwaukee Police Department
2 Milwaukee Police Department
3 Milwaukee Police Department
4 Milwaukee Police Department
5 Milwaukee Police Department
6 Milwaukee Police Department
7 Milwaukee Police Department
8 Milwaukee Police Department
9 Milwaukee Police Department
10 Milwaukee Police Department
答案 0 :(得分:2)
EXTRACT(xml)
function已过时。而是使用XMLTABLE
:
SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS "i",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x
或者,在R中,转义的双引号应该相同:
query_string2 <- "SELECT x.agencyname
FROM CRS.CRS_FILES c
CROSS JOIN XMLTABLE(
XMLNAMESPACES(
'http://www.w3.org/2001/XMLSchema-instance' AS \"i\",
DEFAULT 'http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201'
),
'/REPORT'
PASSING XMLTYPE( c.file_data )
COLUMNS
crsreporttimestamp TIMESTAMP PATH 'CRSREPORTTIMESTAMP',
agencyidentifier VARCHAR2(50) PATH 'AGENCYIDENTIFIER',
agencyname VARCHAR2(100) PATH 'AGENCYNAME'
) x"
idtable <- sqlQuery(ch,query_string2, max=10)
其中,用于您的测试数据:
CREATE TABLE CRS.CRS_FILES ( FILE_DATA CLOB );
INSERT INTO CRS.crs_files VALUES (
'<?xml version="1.0" encoding="utf-8" ?>
<REPORT xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201">
<CRSREPORTTIMESTAMP>2020-10-08T06:49:31.813812</CRSREPORTTIMESTAMP>-
<AGENCYIDENTIFIER>MILWAUKEE</AGENCYIDENTIFIER>-
<AGENCYNAME>Milwaukee Police Department</AGENCYNAME>
</REPORT>'
)
输出:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
如果您确实想使用EXTRACT
,则需要指定XML名称空间:
SELECT XMLTYPE(t.FILE_DATA).EXTRACT(
'//REPORT/AGENCYNAME/text()',
'xmlns="http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201"'
).getStringVal() AS agencyname
FROM CRS.CRS_FILES t
输出:
| AGENCYNAME | | :-------------------------- | | Milwaukee Police Department |
db <>提琴here
答案 1 :(得分:1)
当前的Oracle查询是问题所在,而不是RODBC::sqlQuery
方法。简而言之,您的XPath并未考虑根节点中的默认名称空间。但是,XMLType extract()
函数允许您定义一个临时前缀以便在XPath中使用:
extract(XMLType_instance IN XMLType,
XPath_string IN VARCHAR2,
namespace_string In VARCHAR2 := NULL) RETURN XMLType;
因此,一旦定义了前缀doc
即可将其应用于XPath:
query_string2 <- "SELECT XMLTYPE(t.FILE_DATA).EXTRACT('//doc:REPORT/doc:AGENCYNAME/text()',
'xmlns:doc=\"http://schemas.datacontract.org/2004/07/CrashReport.DataLayer.v20170201\"').getStringVal()
FROM CRS.CRS_FILES t"
idtable <- sqlQuery(ch,query_string2, max=10)
Online Demo (适用于getClobVal
和getStringVal
)