从符合特定标准的文本中抓取数字

时间:2018-04-09 14:01:39

标签: sql-server string pattern-matching sql-like

好的我有一堆数据,所有这些数据都包含文本中的代码但是并没有正确格式化,例如:

  

Wellgreens常规强度抗酸液(氧化铝镁硅烷二甲基硅酸抗酸剂和抗气体)薄荷a)12盎司瓶(NDC 0363-0073-02)b)26盎司瓶(NDC 0363-0073-26)由Walgreens分销CO 200 Wilmot Rd Deerfield IL 60015

     IDPN(透析中肠外营养 - 添加氨基酸的透析液)a)490mL袋b)500mL袋和c)590mL袋Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132

     

Aminosyn-PF(氨基酸)7%无亚硫酸盐500 mL袋装Rx Only Hospira Inc Lake Forest IL 60045 NDC:0409-4178-03条码(01)0 030409 417803 5

我只对8-9位数字格式感兴趣:

xxxx-xxxx或xxxxx-xxxx

我目前使用以下方法选择了这些条目:

rgb.getpixel()

但是我想输出它与之匹配的字符串而不是整个产品描述只输出它找到的代码:

  

0363-0073

     

19061-3132

     

0409-4178

3 个答案:

答案 0 :(得分:3)

对于单个值,您可以使用PATINDEX

SELECT 
    SUBSTRING(ProductDescription
              ,PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
                        ,ProductDescription),
             10), *
FROM t
WHERE 
 [ProductDescription] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%';

<强> DBFiddle Demo

答案 1 :(得分:1)

这是一种略有不同的方法,它不使用UNION ALL

WITH VTE AS (
    SELECT *
    FROM (VALUES ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
                 ('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
                 ('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')) V(S))
SELECT V.S,
       CASE WHEN PI1.C > 0 THEN SUBSTRING(V.S,PI1.C, 10)
            WHEN PI2.C > 0 THEN SUBSTRING(V.S,PI2.C, 9)
            ELSE NULL
       END AS N
FROM VTE V
     CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI1(C)
     CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI2(C);

2 PATINDEX的原因是因为值12345-6789符合模式'%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'。因此,首先完成对10个字符格式的检查,然后是9个字符。如果找不到模式,CASE表达式也可以避免错误,就好像PI1.CPI2.C都返回0(意味着找不到模式)然后 <{1}} 将被退回。

答案 2 :(得分:0)

您可以使用此代码获取任一代码的第一个实例(基于lad2025的答案):

declare @t table (v varchar(8000))

insert @t(v) values ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')

SELECT  *
FROM @T

select  substring(v, patindex('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 10)
from    @t
where   v like '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
union all
select  substring(v, patindex('%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 9)
from    @t
where   v like '%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'