如何在SQL中的字段中批量替换字符串

时间:2019-01-28 10:08:32

标签: sql performance sqlite

上下文和目标

我正在尝试对数据表中的某些产品编号进行匿名处理。请参见下面的示例代码。产品编号是10个数字,在表格中可能是唯一的,也可能不是唯一的。

由于我可能要链接其他表,因此我想使用一种非随机方式对数据进行伪匿名化。

系统为SQLite 3.10.1。但是,任何类型的带有SQL的DBMS都可以。

我的约束是:

  • 保持与原始长度相同
  • 将每个数字替换为另一个数字或字母

我已执行的操作

从字面上看,我将仔细检查每个可能的数字并如下更新。但是,这感觉效率很低。

UPDATE test
SET pseudo_num = replace(pseudo_num, '0', 'B');
UPDATE test
SET pseudo_num = replace(pseudo_num, '1', 'T');
UPDATE test
SET pseudo_num = replace(pseudo_num, '2', 'A');
UPDATE test
SET pseudo_num = replace(pseudo_num, '3', 'A');
UPDATE test
SET pseudo_num = replace(pseudo_num, '4', 'D');
UPDATE test
SET pseudo_num = replace(pseudo_num, '5', '3');
UPDATE test
SET pseudo_num = replace(pseudo_num, '6', '2');
UPDATE test
SET pseudo_num = replace(pseudo_num, '7', '4');
UPDATE test
SET pseudo_num = replace(pseudo_num, '8', 'X');
UPDATE test
SET pseudo_num = replace(pseudo_num, '9', 'L');

问题

  1. 是否有更快的方法来执行此操作,例如通过批量替换?
  2. 是否有一种我可以借鉴的替代标准方法来进行伪匿名化,并且该方法仍在我上面概述的限制之内?

用于创建数据表的示例代码

CREATE TABLE test (
  prod_num varchar(14),
  owner varchar(255) default NULL,
  prod_date varchar(255)
);

INSERT INTO test (prod_num,owner,prod_date) VALUES ("260619275","Kieran","Feb 10, 2018"),("316556232","Steven","Jan 6, 2020"),("625302534","Oliver","Feb 10, 2018"),("811424845","Jeremy","Apr 12, 2018"),("060961216","Quinlan","Jul 19, 2019"),("713794360","Stuart","Nov 1, 2019"),("553381666","George","Jan 8, 2019"),("978519361","Macon","Nov 26, 2018"),("352718969","Raphael","Jul 21, 2019"),("803299478","Byron","Nov 26, 2019");
INSERT INTO test (prod_num,owner,prod_date) VALUES ("696124452","Dalton","Jul 17, 2018"),("892088485","Keane","Jul 9, 2018"),("817054190","Dillon","Apr 23, 2018"),("500170097","Fitzgerald","Feb 11, 2019"),("663252252","Thomas","Apr 10, 2018"),("061983557","Alan","May 12, 2018"),("492057435","Jarrod","Apr 16, 2018"),("837802495","Shad","Mar 22, 2019"),("725698187","Mark","Jul 22, 2018"),("153352349","Akeem","Feb 19, 2018");

ALTER TABLE test 
ADD pseudo_num NVARCHAR(20);

UPDATE test 
SET pseudo_num = prod_num;

4 个答案:

答案 0 :(得分:1)

您可以尝试在此处使用联接进行替换。如果您没有包含从旧到新pseduo_num的映射的正式表,那么我们可以尝试使用CTE。

WITH map AS (
    SELECT '0' AS pseudo_num, 'B' AS output UNION ALL
    SELECT '1', 'T' UNION ALL
    SELECT '2', 'A' UNION ALL
    SELECT '3', 'A' UNION ALL
    SELECT '4', 'D' UNION ALL
    SELECT '5', '3' UNION ALL
    SELECT '6', '2' UNION ALL
    SELECT '7', '4' UNION ALL
    SELECT '8', 'X' UNION ALL
    SELECT '9', 'L'
),
cte AS (
    SELECT t.pseudo_num, m.output
    FROM test t
    INNER JOIN map m
        ON t.pseudo_num = m.psuedo_num
)

UPDATE cte
SET pseudo_num = output;

答案 1 :(得分:1)

您说过“使用SQL的任何类型的DBMS都可以,”因此适用于Postgres:

在Postgres中,您可以使用translate()函数:

UPDATE test
  SET pseudo_num = translate(pseudo_num, '0123456789', 'BTAAD324XL');

在线示例:https://rextester.com/OIMBB72939

答案 2 :(得分:1)

您可以使用哈希(或加密)功能将产品编号转换为字符和长度相同的数字的字符串。相同的产品编号也会获得相同的哈希值/值:

关于TSQL的示例:

-- preview (old and new prod_num)
SELECT prod_num, RIGHT(CONVERT(VARCHAR(32), HASHBYTES('SHA1', prod_num), 2), LEN(prod_num)) 
FROM test;

-- the UPDATE
UPDATE test SET pseudo_num = RIGHT(CONVERT(VARCHAR(32), HASHBYTES('SHA1', prod_num), 2), LEN(prod_num));
  

demo on dbfiddle.uk

MySQL上的示例:

-- preview (old and new prod_num)
SELECT prod_num, UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num))) 
FROM test;

-- the UPDATE
UPDATE test SET pseudo_num = UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)));
  

demo on dbfiddle.uk

Oracle上的示例:

-- preview (old and new prod_num)
SELECT prod_num, SUBSTR(STANDARD_HASH(prod_num, 'MD5'), LENGTH(prod_num) * -1) pseudo_prod_num 
FROM test;

-- the UPDATE
UPDATE test SET pseudo_num = SUBSTR(STANDARD_HASH(prod_num, 'MD5'), LENGTH(prod_num) * -1);
  

demo on dbfiddle.uk

PostgreSQL示例:

-- preview (old and new prod_num)
SELECT prod_num, UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num))) 
FROM test;

-- the UPDATE
UPDATE test SET pseudo_num = UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)));
  

demo on dbfiddle.uk

答案 3 :(得分:0)

在Mariadb上:

alter table test add primary key (prod_num);
replace into test(prod_num, owner, prod_date, pseudo_num)
select 
    prod_num,
    owner,
    prod_date,
    replace(
        replace(
            replace(
                replace(
                    replace(
                        replace(
                            replace(
                                replace(
                                    replace(
                                        replace(prod_num,'0','B')
                                    ,'1','T')
                                ,'2','A')
                            ,'3','A')
                        ,'4','D')
                    ,'5','3')
                ,'6','2')
            ,'7','4')
        ,'8','X')
    ,'9','L') as pseudo_num
from test;