上下文和目标
我正在尝试对数据表中的某些产品编号进行匿名处理。请参见下面的示例代码。产品编号是10个数字,在表格中可能是唯一的,也可能不是唯一的。
由于我可能要链接其他表,因此我想使用一种非随机方式对数据进行伪匿名化。
系统为SQLite 3.10.1。但是,任何类型的带有SQL的DBMS都可以。
我的约束是:
我已执行的操作
从字面上看,我将仔细检查每个可能的数字并如下更新。但是,这感觉效率很低。
UPDATE test
SET pseudo_num = replace(pseudo_num, '0', 'B');
UPDATE test
SET pseudo_num = replace(pseudo_num, '1', 'T');
UPDATE test
SET pseudo_num = replace(pseudo_num, '2', 'A');
UPDATE test
SET pseudo_num = replace(pseudo_num, '3', 'A');
UPDATE test
SET pseudo_num = replace(pseudo_num, '4', 'D');
UPDATE test
SET pseudo_num = replace(pseudo_num, '5', '3');
UPDATE test
SET pseudo_num = replace(pseudo_num, '6', '2');
UPDATE test
SET pseudo_num = replace(pseudo_num, '7', '4');
UPDATE test
SET pseudo_num = replace(pseudo_num, '8', 'X');
UPDATE test
SET pseudo_num = replace(pseudo_num, '9', 'L');
问题
用于创建数据表的示例代码
CREATE TABLE test (
prod_num varchar(14),
owner varchar(255) default NULL,
prod_date varchar(255)
);
INSERT INTO test (prod_num,owner,prod_date) VALUES ("260619275","Kieran","Feb 10, 2018"),("316556232","Steven","Jan 6, 2020"),("625302534","Oliver","Feb 10, 2018"),("811424845","Jeremy","Apr 12, 2018"),("060961216","Quinlan","Jul 19, 2019"),("713794360","Stuart","Nov 1, 2019"),("553381666","George","Jan 8, 2019"),("978519361","Macon","Nov 26, 2018"),("352718969","Raphael","Jul 21, 2019"),("803299478","Byron","Nov 26, 2019");
INSERT INTO test (prod_num,owner,prod_date) VALUES ("696124452","Dalton","Jul 17, 2018"),("892088485","Keane","Jul 9, 2018"),("817054190","Dillon","Apr 23, 2018"),("500170097","Fitzgerald","Feb 11, 2019"),("663252252","Thomas","Apr 10, 2018"),("061983557","Alan","May 12, 2018"),("492057435","Jarrod","Apr 16, 2018"),("837802495","Shad","Mar 22, 2019"),("725698187","Mark","Jul 22, 2018"),("153352349","Akeem","Feb 19, 2018");
ALTER TABLE test
ADD pseudo_num NVARCHAR(20);
UPDATE test
SET pseudo_num = prod_num;
答案 0 :(得分:1)
您可以尝试在此处使用联接进行替换。如果您没有包含从旧到新pseduo_num
的映射的正式表,那么我们可以尝试使用CTE。
WITH map AS (
SELECT '0' AS pseudo_num, 'B' AS output UNION ALL
SELECT '1', 'T' UNION ALL
SELECT '2', 'A' UNION ALL
SELECT '3', 'A' UNION ALL
SELECT '4', 'D' UNION ALL
SELECT '5', '3' UNION ALL
SELECT '6', '2' UNION ALL
SELECT '7', '4' UNION ALL
SELECT '8', 'X' UNION ALL
SELECT '9', 'L'
),
cte AS (
SELECT t.pseudo_num, m.output
FROM test t
INNER JOIN map m
ON t.pseudo_num = m.psuedo_num
)
UPDATE cte
SET pseudo_num = output;
答案 1 :(得分:1)
您说过“使用SQL的任何类型的DBMS都可以,”因此适用于Postgres:
在Postgres中,您可以使用translate()函数:
UPDATE test
SET pseudo_num = translate(pseudo_num, '0123456789', 'BTAAD324XL');
答案 2 :(得分:1)
您可以使用哈希(或加密)功能将产品编号转换为字符和长度相同的数字的字符串。相同的产品编号也会获得相同的哈希值/值:
关于TSQL的示例:
-- preview (old and new prod_num)
SELECT prod_num, RIGHT(CONVERT(VARCHAR(32), HASHBYTES('SHA1', prod_num), 2), LEN(prod_num))
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = RIGHT(CONVERT(VARCHAR(32), HASHBYTES('SHA1', prod_num), 2), LEN(prod_num));
MySQL上的示例:
-- preview (old and new prod_num)
SELECT prod_num, UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)))
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)));
Oracle上的示例:
-- preview (old and new prod_num)
SELECT prod_num, SUBSTR(STANDARD_HASH(prod_num, 'MD5'), LENGTH(prod_num) * -1) pseudo_prod_num
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = SUBSTR(STANDARD_HASH(prod_num, 'MD5'), LENGTH(prod_num) * -1);
PostgreSQL示例:
-- preview (old and new prod_num)
SELECT prod_num, UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)))
FROM test;
-- the UPDATE
UPDATE test SET pseudo_num = UPPER(RIGHT(MD5(prod_num), LENGTH(prod_num)));
答案 3 :(得分:0)
在Mariadb上:
alter table test add primary key (prod_num);
replace into test(prod_num, owner, prod_date, pseudo_num)
select
prod_num,
owner,
prod_date,
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(prod_num,'0','B')
,'1','T')
,'2','A')
,'3','A')
,'4','D')
,'5','3')
,'6','2')
,'7','4')
,'8','X')
,'9','L') as pseudo_num
from test;