SQL查询以空值分隔分隔的管道

时间:2019-02-21 13:12:29

标签: sql oracle

我有一个称为目录的表:

create table contents(file_name varchar2(4000), file_content clob);

这是桌子:

file_name                                  file_content
deID.RESUL_12433287659.txt_234323456.txt   |678976|TEST|TBDKK|7865679809
deID.RESUL_34534563649.txt_345353567.txt   1|678977||TB5KK|7866709
deID.RESUL_44235345636.txt_537967875.txt   |678978|TE2T|TB4KK|78669809
deID.RESUL_35234663456.txt_423452545.txt   4|678979|TE3T|T3DKK|785679809

我需要使用具有以下结构的内容创建另一个名为data_contents的表:

file_name                                  id  number   name  address  phone
deID.RESUL_12433287659.txt_234323456.txt       678976   TEST  TBDKK    7865679809
deID.RESUL_34534563649.txt_345353567.txt    1  678977         TB5KK    7866709
deID.RESUL_44235345636.txt_537967875.txt       678978   TE2T  TB4KK    78669809
deID.RESUL_35234663456.txt_423452545.txt    4  678979   TE3T  T3DKK    785679809

我尝试了以下查询:

with DTE as
(
    select file_name, 
           to_char(file_content) as file_content -- preconvert the clob to a varchar
    from MyTable
)
, CTE as
(
    select file_name, 
           case 
             when substr(file_content,1,1) ='|' -- If the string starts with the delimiter
               then ' '||file_content -- then add a space at the start
             else file_content 
           end as file_content
    from DTE
)

    select file_name,
           regexp_substr (file_content, '[^|]+',1, 1 ) as id,
           regexp_substr (file_content, '[^|]+',1, 2 ) as thenumber, 
           regexp_substr (file_content, '[^|]+',1, 3 ) as thename,
           regexp_substr (file_content, '[^|]+',1, 4 ) as theaddress,
           regexp_substr (file_content, '[^|]+',1, 5) as phone
    from CTE

例如,如果有任何字段为空白。第二行,其中不存在名称,那么我的查询将忽略它,因为所有列值都将移动一个单元格。

如果任何列中都没有值,是否仍要放置NULL值?

3 个答案:

答案 0 :(得分:1)

您可以这样做:

with DTE as
(
  SELECT 'deID.RESUL_12433287659.txt_234323456.txt' file_name, '|678976|TEST|TBDKK|7865679809' file_content FROM dual UNION ALL
  SELECT 'deID.RESUL_34534563649.txt_345353567.txt' file_name, '1|678977||TB5KK|7866709' file_content FROM dual UNION ALL
  SELECT 'deID.RESUL_44235345636.txt_537967875.txt' file_name, '|678978|TE2T|TB4KK|78669809' file_content FROM dual UNION ALL
  SELECT 'deID.RESUL_35234663456.txt_423452545.txt' file_name, '4|678979|TE3T|T3DKK|785679809' file_content FROM dual
)
SELECT file_name,
       file_content,
       REGEXP_SUBSTR(file_content, '(.*?)(\||$)', 1, 1, NULL, 1) ID,
       REGEXP_SUBSTR(file_content, '(.*?)(\||$)', 1, 2, NULL, 1) thenumber,
       REGEXP_SUBSTR(file_content, '(.*?)(\||$)', 1, 3, NULL, 1) thename,
       REGEXP_SUBSTR(file_content, '(.*?)(\||$)', 1, 4, NULL, 1) theaddress,
       REGEXP_SUBSTR(file_content, '(.*?)(\||$)', 1, 5, NULL, 1) phone
FROM   dte;

FILE_NAME                                FILE_CONTENT                  ID THENUMBER THENAME THEADDRESS PHONE
---------------------------------------- ----------------------------- -- --------- ------- ---------- ------------
deID.RESUL_12433287659.txt_234323456.txt |678976|TEST|TBDKK|7865679809    678976    TEST    TBDKK      7865679809
deID.RESUL_34534563649.txt_345353567.txt 1|678977||TB5KK|7866709       1  678977            TB5KK      7866709
deID.RESUL_44235345636.txt_537967875.txt |678978|TE2T|TB4KK|78669809      678978    TE2T    TB4KK      78669809
deID.RESUL_35234663456.txt_423452545.txt 4|678979|TE3T|T3DKK|785679809 4  678979    TE3T    T3DKK      785679809

(我已经用模拟表中数据的子查询替换了您的DTE;您将使用与已有相同的DTE。)

这可以通过匹配0个或多个字符后跟|分隔符(由于正则表达式中的特殊字符,我们必须转义)或字符串的末尾来进行匹配。

然后我们找到第n个匹配项,具体取决于我们在哪一列之后。

最后,我们需要最终选项将返回的值限制为第一组括号中的值(即由.*?定义的文本),否则您将获得附加了|的值同样,不仅仅是价值。

答案 1 :(得分:0)

您可以只替换所有的'|'用'|'表示,因此会考虑空值,然后删除所有多余的空间:

    with DTE as
(
    select file_name, 
           replace(to_char(file_content), '|', ' |') as file_content -- preconvert the clob to a varchar
    from MyTable
)
    select file_name,
           replace(regexp_substr (file_content, '[^|]+',1, 1 ), ' ', '') as id,
           replace(regexp_substr (file_content, '[^|]+',1, 2 ), ' ', '') as thenumber, 
           replace(regexp_substr (file_content, '[^|]+',1, 3 ), ' ', '') as thename,
           replace(regexp_substr (file_content, '[^|]+',1, 4 ), ' ', '') as theaddress,
           replace(regexp_substr (file_content, '[^|]+',1, 5), ' ', '') as phone
    from DTE;

答案 2 :(得分:0)

我使用了略有不同的模式,该模式与管道底部以及管道之前(或最后一个值之后)的内容相匹配

SELECT file_name,
   regexp_substr(file_content, '([A-Z0-9]*)(\|)',1,1,'',1) as id,
   regexp_substr(file_content, '([A-Z0-9]*)(\|)',1,2,'',1) as thenumber,
   regexp_substr(file_content, '([A-Z0-9]*)(\|)',1,3,'',1) as thename,
    regexp_substr(file_content, '([A-Z0-9]*)(\|)',1,4,'',1) as theaddress,
   regexp_substr(file_content, '(\|)([A-Z0-9]*)$',1,1,'',2) as phone
FROM CTE