我需要在两个数据库表中的数据之间提供建议的匹配。基本要求是; - 对于所讨论的两列之间的最大匹配单词数(不论顺序),应建议“匹配”。
例如,给定数据;
Table A Table B
1,'What other text in here' 5,'Other text in here'
2,'What am I doing here' 6,'I am doing what here'
3,'I need to find another job' 7,'Purple unicorns'
4,'Other text in here' 8,'What are you doing in here'
Ideally, my desired matches would look as follows;
1 -> 8 (3 words matched)
2 -> 6 (5 words matched)
3 -> Nothing
4 -> 5 (4 words matched)
我发现word count functions看起来很有前途,但我想不出如何在SQL语句中使用它,这会给我想要的匹配。此外,链接的函数不是我需要的,因为它使用charindex,我认为在单词中搜索单词(即'in'将匹配'bin')。
任何人都可以帮我解决这个问题吗?
感谢。
答案 0 :(得分:5)
我使用下面的sys.dm_fts_parser
将句子分成单词。如果您不在SQL Server 2008上,或者由于某种原因发现它不合适,则有plenty of TSQL split functions around。
要求每个A.id
只能与先前未使用的B.id
配对,反之亦然,这不是我能想到的基于有效集合的解决方案。< / p>
;WITH A(Id, sentence) As
(
SELECT 1,'What other text in here' UNION ALL
SELECT 2,'What am I doing here' UNION ALL
SELECT 3,'I need to find another job' UNION ALL
SELECT 4,'Other text in here'
),
B(Id, sentence) As
(
SELECT 5,'Other text in here' UNION ALL
SELECT 6,'I am doing what here' UNION ALL
SELECT 7,'Purple unicorns' UNION ALL
SELECT 8,'What are you doing in here'
), A_Split
AS (SELECT Id AS A_Id,
display_term,
COUNT(*) OVER (PARTITION BY Id) AS A_Cnt
FROM A
CROSS APPLY
sys.dm_fts_parser('"' + REPLACE(sentence, '"', '""')+'"',1033, 0,0)),
B_Split
AS (SELECT Id AS B_Id,
display_term,
COUNT(*) OVER (PARTITION BY Id) AS B_Cnt
FROM B
CROSS APPLY
sys.dm_fts_parser('"' + REPLACE(sentence, '"', '""')+'"',1033, 0,0)),
Joined
As (SELECT A_Id,
B_Id,
B_Cnt,
Cnt = COUNT(*),
CAST(COUNT(*) as FLOAT)/B_Cnt AS PctMatchBToA,
CAST(COUNT(*) as FLOAT)/A_Cnt AS PctMatchAToB
from A_Split A
JOIN B_Split B
ON A.display_term = B.display_term
GROUP BY A_Id,
B_Id,
B_Cnt,
A_Cnt)
SELECT IDENTITY(int, 1, 1) as id, *
INTO #IntermediateResults
FROM Joined
ORDER BY PctMatchBToA DESC,
PctMatchAToB DESC
DECLARE @A_Id INT,
@B_Id INT,
@Cnt INT
DECLARE @Results TABLE (
A_Id INT,
B_Id INT,
Cnt INT)
SELECT TOP(1) @A_Id = A_Id,
@B_Id = B_Id,
@Cnt = Cnt
FROM #IntermediateResults
ORDER BY id
WHILE ( @@ROWCOUNT > 0 )
BEGIN
INSERT INTO @Results
SELECT @A_Id,
@B_Id,
@Cnt
DELETE FROM #IntermediateResults
WHERE A_Id = @A_Id
OR B_Id = @B_Id
SELECT TOP(1) @A_Id = A_Id,
@B_Id = B_Id,
@Cnt = Cnt
FROM #IntermediateResults
ORDER BY id
END
DROP TABLE #IntermediateResults
SELECT *
FROM @Results
ORDER BY A_Id
返回
A_Id B_Id Cnt
----------- ----------- -----------
1 8 3
2 6 5
4 5 4