我正在尝试将xml提取到由行分隔的表输出中。
数据是Oracle数据库中的CLOB字段,如下所示:
<emailInfo>
<recipientList>
<recipientName>ATS</recipientName>
<recipientEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
<contactEmailList>
<emailAddress>wp2@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>pw@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
<recipientList>
<recipientName>ERG</recipientName>
<recipientEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>sl@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
<escalationEmailList>
<emailAddress>sl2@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
</emailInfo>
EDIT2:我更新的SQL查询如下:
SELECT t.*, m.*, p.*, l.*
FROM cisadm.F1_ext_lookup_val exval,
XMLTABLE ('/emailInfo/recipientList'
PASSING XMLTYPE (exval.bo_data_area)
COLUMNS recipient_name VARCHAR2 (4000) PATH 'recipientName',
recipient_email_list XMLTYPE PATH '/recipientEmailList',
contact_email_list XMLTYPE PATH '/contactEmailList',
escalation_email_list XMLTYPE PATH '/escalationEmailList') t,
XMLTABLE ('/recipientEmailList'
PASSING (t.recipient_email_list)
COLUMNS recipient_email_address VARCHAR2 (4000) PATH '/emailAddress',
rec_email_status_flg VARCHAR2 (10) PATH '/statusFlag') m,
XMLTABLE ('/contactEmailList'
PASSING (t.contact_email_list)
COLUMNS contact_email_address VARCHAR2 (4000) PATH 'contactEmailList/emailAddress',
contact_email_status_flg VARCHAR2 (10) PATH 'contactEmailList/statusFlag'
) p,
XMLTABLE('/escalationEmailList'
PASSING (t.escalation_email_list)
COLUMNS esc_email_address VARCHAR2(4000) PATH 'escalationEmailList/emailAddress',
esc_email_status_flg VARCHAR2(10) PATH 'escalationEmailList/statusFlag'
) l
我正在努力规定每个收件人电子邮件列表,联系人电子邮件列表和升级电子邮件列表可能有多个值。
示例输出应为:
任何帮助都会非常感激!
答案 0 :(得分:1)
对于未来的读者,这里是开源编程中的通用解决方案,用于将XML数据从CLOB字段迁移到csv表格格式。
使用OP的数据需求,这些方法不依赖于任何RDMS,因此可以用于其他数据库连接。此外,克服了SQL的局限性,因为可以使用xpath,数组,循环等各种细微差别:
Python (使用cx_Oracle):
#!/usr/bin/python
import os
import cx_Oracle
import csv
import lxml.etree as ET
# SET DIRECTORY PATH
cd = os.path.dirname(os.path.abspath(__file__))
# DB CONNECTION AND QUERY
db = cx_Oracle.connect("uid/pwd@database")
cur = db.cursor()
clob = cur.execute("SELECT CLOBfield FROM OracleTable").fetchone()
# CLOSE CURSOR AND DATABASE
cur.close()
db.close()
# PARSE XML CONTENT
dom = ET.fromstring(clob)
# DEFINING COLUMNS
columns = ['RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG']
emailnodes = ['recipientEmailList', 'contactEmailList', 'escalationEmailList']
# OPEN CSV FILE
with open(os.path.join(cd,'CLOB_Py.csv'), 'w', newline='') as m:
writer = csv.writer(m)
writer.writerow(columns)
nodexpath = dom.xpath('//recipientList')
dataline = []
for j in range(1,len(nodexpath)+1):
dataline = []
dataline.append(dom.xpath('//recipientList[{0}]/recipientName'.format(j))[0].text)
for n in emailnodes:
# EMAILS
childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[1]'.format(j, n))
# APPEND DATA LINES
for elem in childxpath:
dataline.append(elem.text)
if childxpath == []:
dataline.append('')
# FLAGS
childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[2]'.format(j, n))
# APPEND DATA LINES
for elem in childxpath:
dataline.append(elem.text)
if childxpath == []:
dataline.append('')
writer.writerow(dataline)
PHP (使用PDO Oracle OCI)
// Set Directory Path
$cd = dirname(__FILE__);
// Opening db connection
$db_username = "your_username";
$db_password = "your_password";
$db = "oci:dbname=your_sid";
try {
$dbh = new PDO($db,$db_username,$db_password);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = "SELECT CLOBfield FROM OracleTable";
$STH = $dbh->query($sql);
$clob = $STH->fetch();
}
catch(PDOException $e) {
echo $e->getMessage();
exit;
}
# Closing db connection
$dbh = null;
// Loading XML source
$xpath = simplexml_load_string($clob);
// Writing column headers
$columns = array('RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG');
$emailnodes = array('recipientEmailList', 'contactEmailList', 'escalationEmailList');
$fs = fopen($cd.'/CLOB_PHP.csv', 'w');
fputcsv($fs, $columns);
fclose($fs);
// Writing data lines
$i = 1;
$values = [];
$node = $xpath->xpath('//recipientList');
foreach ($node as $n){
$child = $xpath->xpath('//recipientList['. $i .']/recipientName');
foreach($child as $value) {
$values[] = $value;
}
foreach ($emailnodes as $e){
// EMAILS
$child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[1]');
if (count($child) > 0) {
foreach($child as $value) {
$values[] = $value;
}
}
else {
$values[] = '';
}
// FLAGS
$child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[2]');
if (count($child) > 0) {
foreach($child as $value) {
$values[] = $value;
}
}
else {
$values[] = '';
}
}
$fs = fopen($cd.'/CLOB_PHP.csv', 'a');
fputcsv($fs, $values);
fclose($fs);
$values = [];
$i++;
}
R (使用ROracle):
library(XML)
library(ROracle)
setwd("C:\\Path\\To\\R\\Script")
# OPEN DATABASE AND QUERY
conn <-dbConnect(drv, username = "", password = "", dbname = "")
clobdf <- dbGetQuery(conn, "SELECT CLOBfield FROM OracleTable;")
dbDisconnect(conn)
# READ IN EXTERNAL DATA FILE
doc<-xmlParse(clobdf[[1,1]])
emailnodes <- c('recipientEmailList', 'contactEmailList', 'escalationEmailList')
# EXTRACT NODE VALUES INTO LISTS
recipientNamesList <- xpathSApply(doc, paste0("//recipientList/recipientName"), xmlValue)
for (e in emailnodes){
assign(e, xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[1]"), xmlValue))
}
for (e in emailnodes){
assign(paste0(e, "flg"), xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[2]"), xmlValue))
}
# COMBINE LISTS TO DATA FRAME
xmldf<- data.frame(RECIPENT_NAME = matrix(unlist(recipientNamesList), nrow=2, byrow=T),
RECIPIENT_EMAIL_ADDRESS = matrix(unlist(recipientEmailList), nrow=2, byrow=T),
REC_EMAIL_STATUS_FLG = matrix(unlist(recipientEmailListflg), nrow=2, byrow=T),
CONTACT_EMAIL_ADDRESS = matrix(unlist(contactEmailList), nrow=2, byrow=T),
CONTACT_EMAIL_STATUS_FLG = matrix(unlist(contactEmailListflg), nrow=2, byrow=T),
ESC_EMAIL_ADDRESS = matrix(unlist(escalationEmailList), nrow=2, byrow=T),
ESC_EMAIL_STATUS_FLG = matrix(unlist(escalationEmailListflg), nrow=2, byrow=T))
# OUTPUT TO CSV
write.csv(xmldf, "CLOB_R.csv", na = "", row.names=FALSE)
答案 1 :(得分:0)
此查询返回截图中的数据 -
select
extractvalue(s.column_value, '/*/recipientName') as recipient_name,
extractvalue(s.column_value, '/*/recipientEmailList/emailAddress') as recipient_email_address,
extractvalue(s.column_value, '/*/recipientEmailList/statusFlag') as rec_email_status_flg,
extractvalue(s.column_value, '/*/contactEmailList/emailAddress') as contact_email_address,
extractvalue(s.column_value, '/*/contactEmailList/statusFlag') as contact_email_status_flg,
extractvalue(s.column_value, '/*/escalationEmailList/emailAddress') as esc_email_address,
extractvalue(s.column_value, '/*/escalationEmailList/statusFlag') as esc_email_status_flg
from tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
并且此查询会在单独的行中提取每封电子邮件 -
select recipient_name, email_address, status_flag
from
(
select
recipient_name,
extractvalue(x.column_value, '/*/emailAddress') as email_address,
extractvalue(x.column_value, '/*/statusFlag') as status_flag
from
(
select
extractvalue(s.column_value, '/*/recipientName') as recipient_name,
EXTRACT(s.column_value, '/*') recipients
from tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
) v, table(xmlsequence(EXTRACT(v.recipients, '/*/*'))) x
)
where (email_address is not null or status_flag is not null)
答案 2 :(得分:0)
您可以尝试xmltable
SELECT *
FROM XMLTable('/emailInfo/recipientList' PASSING XMLTYPE('<emailInfo>
<recipientList>
<recipientName>ATS</recipientName>
<recipientEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>pw@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
<recipientList>
<recipientName>ERG</recipientName>
<recipientEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>sl@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
</emailInfo>')
COLUMNS recipient_name VARCHAR2(4000) PATH 'recipientName',
recipient_email_address VARCHAR2(4000) PATH 'recipientEmailList/emailAddress',
rec_email_status_flg VARCHAR2(10) PATH 'recipientEmailList/statusFlag',
contact_email_address VARCHAR2(4000) PATH 'contactEmailList/emailAddress',
contact_email_status_flg VARCHAR2(10) PATH 'contactEmailList/statusFlag',
esc_email_address VARCHAR2(4000) PATH 'escalationEmailList/emailAddress',
esc_email_status_flg VARCHAR2(10) PATH 'escalationEmailList/statusFlag'
) t
与表格相同
SELECT *
FROM tmp,XMLTable('/emailInfo/recipientList' PASSING XMLTYPE(tmp.bo_data_area)
COLUMNS recipient_name VARCHAR2(4000) PATH 'recipientName',
recipient_email_address VARCHAR2(4000) PATH 'recipientEmailList/emailAddress',
rec_email_status_flg VARCHAR2(10) PATH 'recipientEmailList/statusFlag',
contact_email_address VARCHAR2(4000) PATH 'contactEmailList/emailAddress',
contact_email_status_flg VARCHAR2(10) PATH 'contactEmailList/statusFlag',
esc_email_address VARCHAR2(4000) PATH 'escalationEmailList/emailAddress',
esc_email_status_flg VARCHAR2(10) PATH 'escalationEmailList/statusFlag'
) t