Question

我在oracle sql表中有两列，其中许多重要事务信息以不那么容易检索的方式存储。数据集有两列trxn_a和trxn_b。

trxn_a：

2019-01-25〜现金存款〜$ 5,000〜John Doe＃2019-01-26〜现金存款〜$ 1,000〜John Doe＃

trxn_b：

2019-01-25〜现金存款〜$ 3,000〜John Doe＃2019-01-25〜现金存款〜$ 1,500〜John Doe＃2019-01-26〜现金存款〜$ 100〜John Doe＃2019-01-26 〜现金存款〜$ 800〜John Doe＃2019-01-26〜现金存款〜$ 100〜John Doe＃

您将看到字段由~分隔，记录由#分隔。可以有任意数量的事务（因此，一个单元格中可以有任意数量的#）。

上面列出的数据只是两列（因此，两个单元格）中的一条数据记录。

我的目标是将数据转换成多行，每行将按日期汇总sum(trxn_amount)。请在下面查看所需的输出：

 date, trxn_amt_a, trxn_amt_b
 2019-01-25, 5000, 4500
 2019-01-26, 1000, 1000

我尝试了INSTR和SUBSTR函数，但是这种函数无法处理这种数据结构中的差异。另外，我不确定如何：

解析日期，交易金额
按日期和
然后将单元格爆炸成不同的行

Answer 1

这是一个复杂的过程。这是我如何进行的逐步说明。

第一部分包括使用#分隔符将每个值分成行。为此，我们使用REGEXP_SUBSTR()和CONNECT BY来生成递归。

select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
from mytable
connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null

然后，我们需要将每个结果值解析为列。只需使用一系列REGEXP_SUBSTR()即可完成。需要特别注意包含数量值且包含非数字字符（'$5,000'）的列：需要删除无效字符，因此以后可以将该值视为数字。

NB：出于您的目的，实际上您不需要从值中恢复所有4列（日期和金额就足够了）；我将显示所有列，以防您需要访问另一列。

select
    'ta' src,
    regexp_substr(trxn_a,'[^~]+', 1, 1) tdate,
    regexp_substr(trxn_a,'[^~]+', 1, 2) ttype,
    replace(regexp_substr(trxn_a,'[^$~]+', 1, 3), ',', '') tamount,
    regexp_substr(trxn_a,'[^~]+', 1, 4) tuser
from (
    select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
    from mytable
    connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
)

源表（trxn_a，trxn_b）中的每一列都必须分开处理，因为每个值都会生成随机数的记录。可以UNION处理结果，然后外部查询进行条件聚合：

最终查询：

with t as (
    select
        'ta' src,
        regexp_substr(trxn_a,'[^~]+', 1, 1) tdate,
        regexp_substr(trxn_a,'[^~]+', 1, 2) ttype,
        replace(regexp_substr(trxn_a,'[^$~]+', 1, 3), ',', '') tamount,
        regexp_substr(trxn_a,'[^~]+', 1, 4) tuser
    from (
        select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
        from mytable
        connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
    )
    union all
    select
        'tb' src,
        regexp_substr(trxn_b,'[^~]+', 1, 1) tdate,
        regexp_substr(trxn_b,'[^~]+', 1, 2) ttype,
        replace(regexp_substr(trxn_b,'[^$~]+', 1, 3), ',', '') tamount,
        regexp_substr(trxn_b,'[^~]+', 1, 4) tuser
    from (
        select trim(regexp_substr(trxn_b,'[^#]+', 1, level) ) trxn_b, level
        from mytable
        connect by regexp_substr(trxn_b, '[^#]+', 1, level) is not null
    )
)
select
    tdate, 
    SUM(DECODE(src, 'ta', tamount, 0)) trxn_amt_a,
    SUM(DECODE(src, 'tb', tamount, 0)) trxn_amt_b
from t
group by tdate;

使用测试数据， this demo on DB Fiddle 得出：

TDATE       TRXN_AMT_A  TRXN_AMT_B
2019-01-25  5000        4500
2019-01-26  1000        1000

Answer 2

使用REGEXP_SUBSTR来拆分#上的记录。此外，我看不到如何在操作中添加另一列，就像您在输入中没有添加魔术一样。只需Replace(the_below_string,'~cash deposit~',',')

      SELECT DISTINCT REGEXP_SUBSTR 
       ('2019-01-25~cash deposit~$5,000~John 
        Doe#2019-. 
      01-26~cash deposit~$1,000~John Doe#',
       '[^,#]+',1,LEVEL) as "Data" 
        FROM   Table
         CONNECT BY REGEXP_SUBSTR 
       ('2019-01-25~cash deposit~$5,000~John Doe#
         2019-01-26~cash deposit~$1,000~John Doe#
          ','[^,#]+',1,LEVEL) IS NOT NULL

Oracle SQL：从字段分隔记录分隔字符串列中解析交易金额

2 个答案: