Question

我的sql表中有一个Nvarchar列。我想通过将它与一组预定义字符进行比较来用***替换/屏蔽其中的某些单词。

这意味着如果我的专栏包含预定义单词列表中的单词，那么我想屏蔽它们。预定义的单词列表约为400个单词。

DDL和DML示例：

DECLARE @Comments TABLE (id INT, comment VARCHAR(50))
INSERT INTO @Comments VALUES (1, 'Bob is a sales person'), (2, 'Shane is a nice guy')
DECLARE @People TABLE (personname VARCHAR(50))
INSERT INTO @People VALUES ('Bob'), ('Mark'), ('Shane')

样本数据：

+----+-----------------------+
| id |        comment        |
+----+-----------------------+
|  1 | Bob is a sales person |
|  2 | Shane is a nice guy   |
+----+-----------------------+

预定义词：

+------------+
| personname |
+------------+
| Bob        |
| Mark       |
| Shane      |
+------------+

注意：

请参阅附件中的图像以获取示例数据。
如果注释列中包含人名，我想用***替换/屏蔽预定义的单词

Answer 1

如果您有鲍勃和鲍比怎么办？您不能先替换Bob，否则最终将得到*** by。这是一种以相反的长度顺序替换字符串的方法。这是一个开始。

不过，如果鲍勃是雪橇，那么您最终会得到***是***雪橇。

DECLARE @Comments TABLE (id INT, comment VARCHAR(50))
INSERT INTO @Comments VALUES (1, 'Bob is a sales person'), (2, 'Shane is a nice guy')
DECLARE @People TABLE (personname VARCHAR(50))
INSERT INTO @People VALUES ('Bob'), ('Mark'), ('Shane')

DECLARE @curname VARCHAR(50)
DECLARE cur CURSOR FOR   
SELECT personname FROM @people ORDER BY LEN(personname) DESC, personname desc

OPEN cur  

FETCH NEXT FROM cur   
INTO @curname

WHILE @@FETCH_STATUS = 0  
BEGIN       
    UPDATE @Comments SET comment = REPLACE(comment, @curname, REPLICATE('*', LEN(@curname)))

    FETCH NEXT FROM cur   
    INTO @curname
END   
CLOSE cur;  
DEALLOCATE cur;

输出：

SELECT * FROM @Comments

id  comment
1   *** is a sales person
2   ***** is a nice guy

几个快速mods：

1）替换区分大小写的内容：

UPDATE @Comments SET comment = REPLACE(comment COLLATE Latin1_General_CS_AS, @curname, REPLICATE('*', LEN(@curname)))

2）仅替换第一个实例：

UPDATE @Comments SET comment = CASE WHEN CHARINDEX(@curname, comment) > 0   
    THEN STUFF(comment, CHARINDEX(@curname, comment), LEN(@curname), REPLICATE('*', LEN(@curname)))
    ELSE comment
    END

Answer 2

从SQL Server 2017开始，您可以结合使用STRING_AGG（更多信息here）和STRING_SPLIT将句子分解成单词，用***替换名称，然后聚合回来单词成句子：

select co.id, 
      string_agg(case when p.personname is null then c.value else '***' end,' ') as comment
from @Comments co
cross apply string_split(comment,' ') c
left join @People p on p.personname = c.value
group by co.id

结果：

如果单词之间有空格以外的其他字符，则可以在分割之前用空格替换它们：

DECLARE @Comments TABLE (id INT, comment VARCHAR(50))
INSERT INTO @Comments VALUES
  (1, 'Bob (is a sales person)')
, (2, '(Shane:is-a-nice-guy)')
, (3, 'This.is(Mark),he.is-a-nice-guy,too')
DECLARE @People TABLE (personname VARCHAR(50))
INSERT INTO @People VALUES ('Bob'), ('Mark'), ('Shane')

select co.id, 
      trim(string_agg(case when p.personname is null then c.value else '***' end,' ')) as comment
from
(
select id, replace(replace(replace(replace(replace(replace(comment,':',' '),')',' '),'(',' '),',',' '),'.',' '),'-',' ') as comment 
from @Comments
) co
cross apply string_split(comment,' ') c
left join @People p on p.personname = c.value
group by co.id

现在有了这种输入：

您将获得以下输出：

通过与字符集进行比较来替换文本中的关键字

2 个答案: