使用REGEXP_REPLACE提取表中的数据

时间:2014-06-03 20:19:12

标签: regex oracle

我已经问了这个问题,但我不确定为什么没有人回答。 我得到了一个部分答案,也许是因为我把它标记为正确答案......不确定,但会再次试试我的运气。

我的数据如下:

reported_name
--------------

HEMA using TM-0497
TEGDMA
Blue HEMA using TM-0510
Norbloc using TM-0545
SIMAA2 using TM-0547
Tensile Strength using
Appearance using TM-0011
Haze using TM-0561
Blue HEMA using CRM-0126
t-Amyl Alcohol
Transmittance TM-0509
DK (edge corrected) TM-0534
Decanoic Acid CRM-0200
Glycol using CRM-0094
% Ketotifen Released using TM-0578_V2_RELEASE
TMPTMA using CRM-0208
% Ketotifen Released using TM-0578_V2_RE
Ca2DTPA Assay using USP_541 (3 day drying)
Water using TM-0449 OOS Analyst 1, Equip 1, set 2
Leachable Polymer using CRM-0225 Sample B
DMA using TM-0500 2333-30e
Decanoic Acid using TM-0622 - Rev # 1
Ketotifen Fumarate Assay using TM-0624_ASSAY_RC - Rev # 2
Refractive Index using TM-0589 - Day 8
Refractive Index using TM-0589 - Rev # 0 - Day 5

我需要输出如下:

    reported_name            analysis_method    revision_number
    --------------           --------------     ------------------

    HEMA using TM-0497       TM-0497            null
    TEGDMA                   null               null
    Blue HEMA using TM-0510  TM-0510            null
    Norbloc using TM-0545    TM-0545            null
    SIMAA2 using TM-0547     TM-0547            null
    Tensile Strength using   null               null
    Appearance using TM-0011 TM-0011            null
    Haze using TM-0561       TM-0561            null
    Blue HEMA using CRM-0126 CRM-0126           null
    t-Amyl Alcohol           null               null
    Transmittance TM-0509    TM-0509            null
    DK () TM-0534            TM-0534            null
    Decanoic Acid CRM-0200   null               null
    Glycol using CRM-0094    CRM-0094           null
    % Ketotifen Released 
    using TM-0578_V2_RELEASE TM-0578_V2_RELEASE null
    TMPTMA using CRM-0208    CRM-0208           null
    % Ketotifen Released 
    using TM-0578_V2_RE      TM-0578_V2_RE      null
    Ca2DTPA Assay using 
    USP_541 (3 day drying)   USP_541            null
    Water using TM-0449 
    OOS Analyst 1            TM-0449            null
    Leachable Polymer 
    using CRM-0225 Sample B  CRM-0225           null
    DMA using TM-0500 2333-  TM-0500            null
    Decanoic Acid using 
    TM-0622 - Rev # 1        TM-0622            Rev # 1
    Ketotifen Fumarate Assay
    using TM-0624_ASSAY_RC
    - Rev # 2                TM-0624_ASSAY_RC   Rev # 2
    Refractive Index using 
    TM-0589 - Day 8          TM-0589            null
    Refractive Index using 
    TM-0589 - Rev # 0 
    - Day 5                  TM-0589            Rev # 0

这是可能的,因为我似乎无法使其正常工作。 我仍然需要找到提取analyze_method的方法,当我看到像CRM-0200这样的字符串:'Decanoic Acid CRM-0200'

这是我到目前为止所得到的:

select  distinct t.reported_name, 
       (case 
           when regexp_like(t.reported_name, '.* using (.*)([ ]?[-]?[ ]?Rev.*)') 
            then regexp_replace(t.reported_name, '.* using (.*)([ ]?- Rev.*)', '\1')
          when regexp_like(t.reported_name, '.* using (.*)') 
            then regexp_replace(t.reported_name, '.* using (.*)', '\1') 
          else '' end)  as analysis_method_regexp,

        (case when regexp_like(t.reported_name, '.*[ ]?[-]?[ ]?(Rev[ ]?#[ ]?[0-9]+).*') 
          then regexp_replace(t.reported_name, '.*[ ]?[-]?[ ]?(Rev[ ]?#[ ]?[0-9]+).*', '\1') 
          else '' end)  as revision_regexp
from test t; 

1 个答案:

答案 0 :(得分:0)

以下是我解决问题的方法。首先描述每个输出列的搜索条件:
   - 第1列是整行    - 如果该行中有“using”,则第二列是“using”后面的字符串。    - 如果该行中包含字符串“Rev#”,则第3列为“Rev#”加上其后的字符串。

然后使用正则表达式为匹配规则编写规则:

with sel as(
  --  Test data that matches all combinations.
  select 'TEGMA' str from dual
  union
  select 'Blue HEMA using TM-0510' str from dual
  union
  select 'Decanoic Acid using TM-0622 - Rev # 1' str from dual
  union
  select '% Ketotifen Released using TM-0578_V2_RELEASE' str from dual
  union
  select 'Leachable Polymer using CRM-0225 Sample B' str from dual
  union
  select 'Ketotifen Fumarate Assay using TM-0624_ASSAY_RC - Rev # 2' str from dual
)  
SELECT str reported_name,
       CASE
         -- if line contains " using " then...
         WHEN regexp_like(str, '.* using .*') THEN
           -- use groups. 1st group is the beginning of the line, ending with
           --   "...using ".  The second group starts right after the space after
           --   "using ", and matches any number of characters that are not a 
           --   space or the end of the line. Followed by any number of any
           --   characters that we do not care about.  
           --   Return the second group only.
           regexp_replace(str, '^(.* using )([^ $]*).*', '\2') 
       END analysis_method,
       CASE
         -- if line contains " Rev # (a number)" then..
         WHEN regexp_like(str, '.* Rev # \d.*') THEN
           regexp_replace(str, '(.* )(Rev # [^ $]).*', '\2') 
       END revision_number
from sel;

输出:

enter image description here