如何在T-SQL中清理字符串并提取数字后缀

时间:2019-05-23 13:17:40

标签: sql-server regex tsql

我有一个包含名称的字符串,并且在大多数情况下,它的后缀带有一个或两个数字。此数字后缀应与名称分开。一个数字代表一种状态,应提取出来。如果有两个数字,则是从右数第二个;如果有一个数字,则是从右数第一个。这些数字用下划线分隔。在名称中也可以使用下划线。 结果应为具有明确名称和提取状态的列。

我试图用标准字符串函数(例如Substring,Charindex,Patindex,LEN和son on)解决问题。但是我的方法很快变得非常笨重,难以维护。我想知道是否有一种具有常规SQl-Server功能的优雅解决方案(如果可能的话,无需为regex安装额外功能)。

SELECT _data.myStr
    -- , ... AS clearname  /*String cleaned from number_postfixes*/
    -- , ... AS Status     /*second number from the right*/
FROM (
    SELECT 'tree_leafs_offer_2_1' AS myStr  --clearname: tree_leafs_offer; cut off: _2_1; extracted status: 2
        UNION
    SELECT 'tree_leafs_offer_2_10' AS myStr --clearname: tree_leafs_offer_2_10; cut off: _2_10; extracted status: 2
        UNION
    SELECT 'tree_leafs_offer_2_2' AS myStr  --clearname: tree_leafs_offer; cut off: _2_2; extracted status: 2
        UNION
    SELECT 'tree_leafs_offer_1150_1' AS myStr   --clearname: tree_leafs_offer; cut off: _1150_1; extracted status: 1150
        UNION
    SELECT 'tree_leafs_offer_1150_10' AS myStr  --clearname: tree_leafs_offer; cut off: _1150_10; extracted status: 1150
        UNION
    SELECT 'builder_bundle_less_xl_1' AS myStr  --clearname: builder_bundle_less_xl; cut off: _1; extracted status: 1
        UNION
    SELECT 'builder_bundle_less_xl_10' AS myStr --clearname: builder_bundle_less_xl; cut off: _10; extracted status: 10
        UNION
    SELECT 'static_components_wolves_10_4' AS myStr --clearname: static_components_wolves; cut off: _10_4; extracted status: 4
        UNION
    SELECT 'coke_0_boring_components_bundle_grant_1' AS myStr   --clearname: oke_0_boring_components_bundle_grant; cut off: _1; extracted status: 1
        UNION
    SELECT 'coke_0_soccer18_end_1_4h_101' AS myStr  --clearname: coke_0_soccer18_end_1_4h; cut off: _101; extracted status: 101
        UNION
    SELECT 'coke_0_late_downsell_bundle_high_114' AS myStr  --clearname: coke_0_late_downsell_bundle_high; cut off: _114; extracted status: 114
        UNION
    SELECT 'itembundle_mine_bundle_small' AS myStr  --clearname: itembundle_mine_bundle_small; cut off: <nothing>; extracted status: NULL
) AS _data
As-Is Result:
-----------------
myStr:
---------------------------------------
builder_bundle_less_xl_1
builder_bundle_less_xl_10
coke_0_boring_components_bundle_grant_1
coke_0_late_downsell_bundle_high_114
coke_0_soccer18_end_1_4h_101
itembundle_mine_bundle_small
static_components_wolves_10_4
tree_leafs_offer_1150_1
tree_leafs_offer_1150_10
tree_leafs_offer_2_1
tree_leafs_offer_2_10
tree_leafs_offer_2_2

To-Be Result (two new columns):
-------------------
clearname:                              |Status
----------------------------------------------
builder_bundle_less_xl                  |   1
builder_bundle_less_xl                  |  10
coke_0_boring_components_bundle_grant   |   1
coke_0_late_downsell_bundle_high        | 114
coke_0_soccer18_end_1_4h                | 101
itembundle_mine_bundle_small            |NULL
static_components_wolves                |  10
tree_leafs_offer                        |1150
tree_leafs_offer                        |1150
tree_leafs_offer                        |   2
tree_leafs_offer                        |   2
tree_leafs_offer                        |   2

2 个答案:

答案 0 :(得分:3)

说实话:这种格式太糟糕了!如果这不是一次性动作,那么您必须尝试在之前更改此操作。

但是-如果您必须坚持这样做-您可以尝试一下:

编辑:解决了状态位置的错误计算...

DECLARE  @tbl TABLE(ID INT IDENTITY,myStr VARCHAR(1000));
INSERT INTO @tbl VALUES
 ('tree_leafs_offer_2_1')
,('tree_leafs_offer_2_10')
,('tree_leafs_offer_2_2')
,('tree_leafs_offer_1150_1')
,('tree_leafs_offer_1150_10')
,('builder_bundle_less_xl_1')
,('builder_bundle_less_xl_10')
,('static_components_wolves_10_4')
,('coke_0_boring_components_bundle_grant_1')
,('coke_0_soccer18_end_1_4h_101')
,('coke_0_late_downsell_bundle_high_114')
,('itembundle_mine_bundle_small');

-查询

WITH cte AS
(
    SELECT t.ID
          ,t.myStr 
            ,A.[key] AS Position
            ,A.[value] AS WordFragment
            ,B.CastedToInt
    FROM @tbl t
    CROSS APPLY OPENJSON(N'["' + REPLACE(t.myStr,'_','","') + '"]') A
    CROSS APPLY(SELECT TRY_CAST(A.[value] AS INT)) B(CastedToInt)
) 
SELECT ID
      ,myStr
        ,STUFF(
        (SELECT CONCAT('_',cte2.WordFragment)
        FROM cte cte2
        WHERE cte2.ID=cte.ID
            AND cte2.Position<=A.PositionHighestNonInt
        ORDER BY cte2.Position
        FOR XML PATH('')
        ),1,1,'') AS ClearName
        ,(SELECT cte3.CastedToInt FROM cte cte3 WHERE cte3.ID=cte.ID AND cte3.Position=A.PositionHighestNonInt+1) AS [Status]
FROM cte
CROSS APPLY (
                 SELECT ISNULL(MAX(x.Position),1000) 
                 FROM cte x 
                 WHERE x.ID=cte.ID AND x.CastedToInt IS NULL
             ) A(PositionHighestNonInt)
GROUP BY ID,myStr,PositionHighestNonInt;

结果

+----+---------------------------------------+--------+
| ID | ClearName                             | Status |
+----+---------------------------------------+--------+
| 1  | tree_leafs_offer                      | 2      |
+----+---------------------------------------+--------+
| 2  | tree_leafs_offer                      | 2      |
+----+---------------------------------------+--------+
| 3  | tree_leafs_offer                      | 2      |
+----+---------------------------------------+--------+
| 4  | tree_leafs_offer                      | 1150   |
+----+---------------------------------------+--------+
| 5  | tree_leafs_offer                      | 1150   |
+----+---------------------------------------+--------+
| 6  | builder_bundle_less_xl                | 1      |
+----+---------------------------------------+--------+
| 7  | builder_bundle_less_xl                | 10     |
+----+---------------------------------------+--------+
| 8  | static_components_wolves              | 10     |
+----+---------------------------------------+--------+
| 9  | coke_0_boring_components_bundle_grant | 1      |
+----+---------------------------------------+--------+
| 10 | coke_0_soccer18_end_1_4h              | 101    |
+----+---------------------------------------+--------+
| 11 | coke_0_late_downsell_bundle_high      | 114    |
+----+---------------------------------------+--------+
| 12 | itembundle_mine_bundle_small          | NULL   |
+----+---------------------------------------+--------+

想法:

  • 模型表
  • 中提供数据
  • OPENJSON使用技巧,以拆分字符串并查找可以转换为INT的部分。
  • 找到最高的 non-int 片段。 Status将成为下一个索引
  • 在v2017中,您可以使用STRING_AGG,但在v2016中,我们必须使用基于XML的技巧将所有片段连接到之前 [Status]

答案 1 :(得分:1)

一种可能的方法是使用字符串替换和SQL Server 2016+的JSON功能。每行都被反转并转换为有效的JSON数组(例如,'tree_leafs_offer_2_1'转换为'["1","2","reffo","sfael","eert"]')。然后,您可以使用JSON_VALUE(<json_array>, '$[0]')JSON_VALUE(<json_array>, '$[1]')TRY_CONVERT()轻松检查第一和第二项是否为有效数字。如果您在右边最多有两个数字,这将起作用。

输入:

CREATE TABLE #Data (
   myStr varchar(max)
)
INSERT INTO #Data 
   (MyStr)
VALUES   
   ('tree_leafs_offer_2_1'),
   ('tree_leafs_offer_2_10'),
   ('tree_leafs_offer_2_2'),
   ('tree_leafs_offer_1150_1'),
   ('tree_leafs_offer_1150_10'),
   ('builder_bundle_less_xl_1'),
   ('builder_bundle_less_xl_10'),
   ('static_components_wolves_10_4'),
   ('coke_0_boring_components_bundle_grant_1'),
   ('coke_0_soccer18_end_1_4h_101'),
   ('coke_0_late_downsell_bundle_high_114'),
   ('itembundle_mine_bundle_small')

T-SQL:

SELECT 
   LEFT(myStr, LEN(myStr) - CHARINDEX('_', REVERSE(myStr))) as ClearName,
   REVERSE(LEFT(REVERSE(myStr), CHARINDEX('_', REVERSE(myStr)) - 1)) AS Status
FROM (
   SELECT 
      CASE 
         WHEN 
            TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[1]'))) IS NULL AND
            TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[0]'))) IS NULL
            THEN CONCAT(myStr, '_0') 
         WHEN 
            TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[1]'))) IS NULL AND 
            TRY_CONVERT(int, REVERSE(JSON_VALUE(CONCAT('["', REPLACE(STRING_ESCAPE(REVERSE(MyStr), 'json'), '_', '","'), '"]'), '$[0]'))) IS NOT NULL
            THEN MyStr 
         ELSE LEFT(myStr, LEN(myStr) - CHARINDEX('_', REVERSE(myStr)))
      END AS myStr      
   FROM #Data
) fixed
ORDER BY MyStr

输出:

----------------------------------------------
ClearName                               Status
----------------------------------------------
builder_bundle_less_xl                  1
builder_bundle_less_xl                  10
coke_0_boring_components_bundle_grant   1
coke_0_late_downsell_bundle_high        114
coke_0_soccer18_end_1_4h                101
itembundle_mine_bundle_small            0
static_components_wolves                10
tree_leafs_offer                        1150
tree_leafs_offer                        1150
tree_leafs_offer                        2
tree_leafs_offer                        2
tree_leafs_offer                        2