Question

我有一个包含一些垃圾值的文件，在将该文件加载到表中时需要清除它们。在这里举一些例子。文件以分号分隔，最后一列具有这些垃圾值。

2019-02-20;05377378;ABC+xY+++Rohit Anita Chicago
2019-02-20;05201343;ABC+xY++Gustav Russia
2019-02-20;07348738;ABC+xy+++Jain Ram Ambarnath

现在我必须在没有 ABC + xY +++ 值的情况下加载最后一列。但有些行我有ABC + xY +++和一些ABC + xY ++。任何摆脱这个的建议。这意味着2次或3次 + 可用

我正在使用Informatica Powercenter加载此文件。在表达式中，我需要创建一些substr / instr函数。我也可以在oracle sql中进行测试，以快速了解输入的值是否正确。

我的预期输出是

请提出任何建议。

谢谢，碧通

Answer 1

我认为您正在搜索以下内容：

WITH dat AS (SELECT '2019-02-20;05373487378;ABC+xY++Rohit Anita Chicago' AS adress FROM dual)
SELECT regexp_REPLACE(adress, '(.*);ABC\+x[yY]\+{2,3}(.*)','\1;\2') FROM dat

Answer 2

我不太肯定我理解您的问题，但这可以满足我的要求，可以在SQL和Infa表达式中使用。

with myrecs as
(select '2019-02-20;870789789707;ABC+xY++Gustav Russia' as myfield from dual union 
all
 select '2019-02-20;870789789707;ABC+xY+++Carroll Iowa' as myfield from dual)

 select myfield,

    substr(myfield,1, instr(myfield,';',-1)) ---will select everything up to, and including the final semicolon
    ||--concatenate
    substr(myfield,instr(myfield,'+',-1)+1) as yourfield --will select everything after the final plus sign
 from myrecs;

OUTPUT:
myfield                                         yourfield
2019-02-20;870789789707;ABC+xY++Gustav Russia   2019-02-20;870789789707;Gustav Russia
2019-02-20;870789789707;ABC+xY+++Carroll Iowa   2019-02-20;870789789707;Carroll Iowa

Answer 3

Informatica PowerCenter提供了一些使用正则表达式的功能。在这种情况下，您需要REG_EXTRACT。

函数already available有一个很好的描述-检查并赞成:)

根据它，您很可能需要定义以下端口：

your_output_port=REG_EXTRACT(ADDRESS, '([^\+]+)$', 1)

Here's我如何测试它。

Answer 4

这是解决方案。

substr
    ( 
        Address,
        0, 
        instr(Address ,';',-1)
    )
    ||
substr
    (
        Address,
        instr(Address ,'+',-1)
    )

您可能需要根据需要在次要开始/结束位置添加+1。

从动态字符串中删除字符

4 个答案: