从Oracle CLOB字段中将XML提取到行中

时间:2015-09-24 04:24:44

标签: sql xml oracle extract

我正在尝试将xml提取到由行分隔的表输出中。

数据是Oracle数据库中的CLOB字段,如下所示:

<emailInfo>
 <recipientList>
  <recipientName>ATS</recipientName>
  <recipientEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  <contactEmailList>
   <emailAddress>wp2@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>pw@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>

 <recipientList>
  <recipientName>ERG</recipientName>
  <recipientEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>sl@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
  <escalationEmailList>
   <emailAddress>sl2@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>
</emailInfo>

EDIT2:我更新的SQL查询如下:

             SELECT t.*, m.*, p.*, l.*
             FROM cisadm.F1_ext_lookup_val exval,

                  XMLTABLE ('/emailInfo/recipientList'
                     PASSING XMLTYPE (exval.bo_data_area)
                     COLUMNS recipient_name                VARCHAR2 (4000)  PATH 'recipientName',
                             recipient_email_list          XMLTYPE          PATH '/recipientEmailList',
                             contact_email_list            XMLTYPE          PATH '/contactEmailList',
                             escalation_email_list         XMLTYPE          PATH '/escalationEmailList') t,
                  XMLTABLE ('/recipientEmailList'
                     PASSING (t.recipient_email_list)
                     COLUMNS recipient_email_address       VARCHAR2 (4000)  PATH '/emailAddress',
                             rec_email_status_flg          VARCHAR2 (10)    PATH '/statusFlag') m,
                  XMLTABLE ('/contactEmailList'
                     PASSING (t.contact_email_list)
                     COLUMNS contact_email_address         VARCHAR2 (4000)  PATH 'contactEmailList/emailAddress',
                             contact_email_status_flg      VARCHAR2 (10)    PATH 'contactEmailList/statusFlag'
                             ) p,
                  XMLTABLE('/escalationEmailList'
                     PASSING (t.escalation_email_list)
                     COLUMNS     esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                                 esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
                      ) l

我正在努力规定每个收件人电子邮件列表,联系人电子邮件列表和升级电子邮件列表可能有多个值。

示例输出应为:

SampleOutput

任何帮助都会非常感激!

3 个答案:

答案 0 :(得分:1)

对于未来的读者,这里是开源编程中的通用解决方案,用于将XML数据从CLOB字段迁移到csv表格格式。

使用OP的数据需求,这些方法不依赖于任何RDMS,因此可以用于其他数据库连接。此外,克服了SQL的局限性,因为可以使用xpath,数组,循环等各种细微差别:

Python (使用cx_Oracle):

#!/usr/bin/python
import os
import cx_Oracle
import csv
import lxml.etree as ET

# SET DIRECTORY PATH
cd = os.path.dirname(os.path.abspath(__file__))

# DB CONNECTION AND QUERY
db = cx_Oracle.connect("uid/pwd@database")    
cur = db.cursor()
clob = cur.execute("SELECT CLOBfield FROM OracleTable").fetchone()

# CLOSE CURSOR AND DATABASE
cur.close()
db.close()

# PARSE XML CONTENT
dom = ET.fromstring(clob)

# DEFINING COLUMNS
columns = ['RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
           'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
           'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG']

emailnodes = ['recipientEmailList', 'contactEmailList', 'escalationEmailList']

# OPEN CSV FILE
with open(os.path.join(cd,'CLOB_Py.csv'), 'w', newline='') as m:
    writer = csv.writer(m)    
    writer.writerow(columns)

    nodexpath = dom.xpath('//recipientList')

    dataline = []    
    for j in range(1,len(nodexpath)+1):

        dataline = []        
        dataline.append(dom.xpath('//recipientList[{0}]/recipientName'.format(j))[0].text)

        for n in emailnodes:   
            # EMAILS
            childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[1]'.format(j, n))            

            # APPEND DATA LINES   
            for elem in childxpath:
                dataline.append(elem.text)

            if childxpath == []:
                dataline.append('')

            # FLAGS
            childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[2]'.format(j, n))

            # APPEND DATA LINES   
            for elem in childxpath:
                dataline.append(elem.text)

            if childxpath == []:
                dataline.append('')

        writer.writerow(dataline)

PHP (使用PDO Oracle OCI

// Set Directory Path
$cd = dirname(__FILE__);

// Opening db connection
$db_username = "your_username";
$db_password = "your_password";
$db = "oci:dbname=your_sid";

try {
    $dbh = new PDO($db,$db_username,$db_password);          
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

    $sql = "SELECT CLOBfield FROM OracleTable";    
    $STH = $dbh->query($sql);    
    $clob = $STH->fetch();
}

catch(PDOException $e) {  
    echo $e->getMessage();
    exit;
}

# Closing db connection
$dbh = null;

// Loading XML source
$xpath = simplexml_load_string($clob);

// Writing column headers
$columns = array('RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
                 'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
                 'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG');

$emailnodes = array('recipientEmailList', 'contactEmailList', 'escalationEmailList');

$fs = fopen($cd.'/CLOB_PHP.csv', 'w');
fputcsv($fs, $columns);      
fclose($fs);    

// Writing data lines
$i = 1;
$values = [];
$node = $xpath->xpath('//recipientList');    

foreach ($node as $n){

     $child = $xpath->xpath('//recipientList['. $i .']/recipientName');
     foreach($child as $value) {            
          $values[] = $value;         
     }

     foreach ($emailnodes as $e){

          // EMAILS       
          $child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[1]');

          if (count($child) > 0) {
              foreach($child as $value) {           
                 $values[] = $value;         
              }
          }   
          else {
                 $values[] = '';
          }

          // FLAGS
          $child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[2]');

          if (count($child) > 0) {
              foreach($child as $value) {           
                 $values[] = $value;         
              }
          }   
          else {
                 $values[] = '';
          }
     }  

     $fs = fopen($cd.'/CLOB_PHP.csv', 'a');
     fputcsv($fs, $values);      
     fclose($fs);  

     $values = [];
     $i++;

}

R (使用ROracle):

library(XML)
library(ROracle)

setwd("C:\\Path\\To\\R\\Script")

# OPEN DATABASE AND QUERY
conn <-dbConnect(drv, username = "", password = "", dbname = "")
clobdf <- dbGetQuery(conn, "SELECT CLOBfield FROM OracleTable;")
dbDisconnect(conn)

# READ IN EXTERNAL DATA FILE
doc<-xmlParse(clobdf[[1,1]])

emailnodes <- c('recipientEmailList', 'contactEmailList', 'escalationEmailList')

# EXTRACT NODE VALUES INTO LISTS
recipientNamesList <- xpathSApply(doc, paste0("//recipientList/recipientName"), xmlValue)

for (e in emailnodes){
    assign(e, xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[1]"), xmlValue))
}

for (e in emailnodes){
  assign(paste0(e, "flg"), xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[2]"), xmlValue))
}

# COMBINE LISTS TO DATA FRAME
xmldf<- data.frame(RECIPENT_NAME =  matrix(unlist(recipientNamesList), nrow=2, byrow=T),
                   RECIPIENT_EMAIL_ADDRESS = matrix(unlist(recipientEmailList), nrow=2, byrow=T),
                   REC_EMAIL_STATUS_FLG  = matrix(unlist(recipientEmailListflg), nrow=2, byrow=T),
                   CONTACT_EMAIL_ADDRESS = matrix(unlist(contactEmailList),   nrow=2, byrow=T),                
                   CONTACT_EMAIL_STATUS_FLG = matrix(unlist(contactEmailListflg),   nrow=2, byrow=T),                
                   ESC_EMAIL_ADDRESS = matrix(unlist(escalationEmailList), nrow=2, byrow=T),
                   ESC_EMAIL_STATUS_FLG = matrix(unlist(escalationEmailListflg), nrow=2, byrow=T))   

# OUTPUT TO CSV
write.csv(xmldf, "CLOB_R.csv", na = "", row.names=FALSE)

答案 1 :(得分:0)

此查询返回截图中的数据 -

select 
    extractvalue(s.column_value, '/*/recipientName') as recipient_name,
    extractvalue(s.column_value, '/*/recipientEmailList/emailAddress') as recipient_email_address,
    extractvalue(s.column_value, '/*/recipientEmailList/statusFlag') as rec_email_status_flg,
    extractvalue(s.column_value, '/*/contactEmailList/emailAddress') as contact_email_address,
    extractvalue(s.column_value, '/*/contactEmailList/statusFlag') as contact_email_status_flg,
    extractvalue(s.column_value, '/*/escalationEmailList/emailAddress') as esc_email_address,
    extractvalue(s.column_value, '/*/escalationEmailList/statusFlag') as esc_email_status_flg
from  tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s

并且此查询会在单独的行中提取每封电子邮件 -

select recipient_name, email_address, status_flag
 from
(
    select 
           recipient_name,
           extractvalue(x.column_value, '/*/emailAddress') as email_address,
           extractvalue(x.column_value, '/*/statusFlag') as status_flag
    from
    (
        select 
            extractvalue(s.column_value, '/*/recipientName') as recipient_name,
            EXTRACT(s.column_value, '/*') recipients
        from  tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
    ) v, table(xmlsequence(EXTRACT(v.recipients, '/*/*'))) x
)
where (email_address is not null or status_flag is not null)

答案 2 :(得分:0)

您可以尝试xmltable

SELECT *
    FROM XMLTable('/emailInfo/recipientList' PASSING XMLTYPE('<emailInfo>
 <recipientList>
  <recipientName>ATS</recipientName>
  <recipientEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>pw@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>

 <recipientList>
  <recipientName>ERG</recipientName>
  <recipientEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>sl@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>
</emailInfo>')
                  COLUMNS recipient_name            VARCHAR2(4000)   PATH 'recipientName',
                          recipient_email_address   VARCHAR2(4000)   PATH 'recipientEmailList/emailAddress',
                          rec_email_status_flg      VARCHAR2(10)     PATH 'recipientEmailList/statusFlag',
                          contact_email_address     VARCHAR2(4000)   PATH 'contactEmailList/emailAddress',
                          contact_email_status_flg  VARCHAR2(10)     PATH 'contactEmailList/statusFlag',
                          esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                          esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
) t

与表格相同

SELECT *
    FROM tmp,XMLTable('/emailInfo/recipientList' PASSING XMLTYPE(tmp.bo_data_area)
                  COLUMNS recipient_name            VARCHAR2(4000)   PATH 'recipientName',
                          recipient_email_address   VARCHAR2(4000)   PATH 'recipientEmailList/emailAddress',
                          rec_email_status_flg      VARCHAR2(10)     PATH 'recipientEmailList/statusFlag',
                          contact_email_address     VARCHAR2(4000)   PATH 'contactEmailList/emailAddress',
                          contact_email_status_flg  VARCHAR2(10)     PATH 'contactEmailList/statusFlag',
                          esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                          esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
) t