字符串SQL中的第一个匹配关键字

时间:2013-10-21 11:59:14

标签: tsql sql-server-2005

我在一个旧的传统销售系统中有一堆产品描述数据,我们试图通过对文本描述字段中包含的型号进行最佳猜测来进行一些销售分析。

所以我的销售线看起来像这样:

LineitemID | Description
----
1 | Sony Headphones for a Sony DHJ232
2 | Sony DHJ232 in blue
3 | SANYO KI8767 with carry case

然后我有一个单独的表格,其中包含所有潜在的产品系列。

ProductRange
----
Sony DHJ232
SANYO KI8767
Sony Headphones

我想编写一个返回所有LineItem的查询,最好猜测他们与之结合的ProductRange,这很简单,使用简单的JOIN和LIKE语句;复杂性如LineItem#1中所述,我们提到了两个不同的产品范围,这会导致多个匹配,其中一个匹配不正确。

在这个找到多个匹配项的实例中,我想假设字符串中的第一个匹配是最正确的。即索尼耳机,而不是索尼DHJ232。

任何人都可以就此采取最佳方法提出一些建议吗?

3 个答案:

答案 0 :(得分:1)

像这样的东西。您应该在“描述”字段中使用子字符串的位置来排序结果(使用CHARINDEX())并首先选择(最低)。

SELECT LineitemId,Description,ProductRange

FROM
(
SELECT LineitemId,Description,PR.ProductRange as ProductRange,
       ROW_NUMBER() OVER (PARTITION BY LineitemId 
                          ORDER BY CHARINDEX(PR.ProductRange,Description)
                          ) AS RowN

FROM T
JOIN PR on (T.Description LIKE '%'+PR.ProductRange+'%')
) as T1
WHERE RN=1

答案 1 :(得分:0)

;WITH MATCH_START AS
(
    SELECT LI.POS, LI.LINEITEMID, PRODUCT.PRODUCTRANGE, LI.DESCRIPTION 
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY LINEITEMID) POS, LINEITEMID, DESCRIPTION FROM LINEITEM) LI 
        JOIN PRODUCT ON LI.DESCRIPTION LIKE PRODUCT.PRODUCTRANGE+'%'
),
MATCH_CONTAINS AS 
(
    SELECT LI.POS, LI.LINEITEMID, PRODUCT.PRODUCTRANGE, LI.DESCRIPTION 
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY LINEITEMID) POS, LINEITEMID, DESCRIPTION FROM LINEITEM) LI 
        JOIN PRODUCT ON LI.DESCRIPTION LIKE '%'+PRODUCT.PRODUCTRANGE+'%'
),
MIN_START_POS AS (
    SELECT MIN(POS) AS MIN_POS, PRODUCTRANGE FROM MATCH_START
    GROUP BY PRODUCTRANGE
),
MIN_CONTAIN_POS AS (
    SELECT MIN(POS) AS MIN_POS, PRODUCTRANGE FROM MATCH_CONTAINS
    GROUP BY PRODUCTRANGE
)

SELECT MS.PRODUCTRANGE,MS.DESCRIPTION, MS.LINEITEMID FROM MATCH_START MS
JOIN MIN_START_POS MSP ON MS.POS = MSP.MIN_POS AND MSP.PRODUCTRANGE = MS.PRODUCTRANGE

UNION 

SELECT MC.PRODUCTRANGE, MC.DESCRIPTION, MC.LINEITEMID FROM MATCH_CONTAINS MC
JOIN MIN_CONTAIN_POS MCP ON MC.POS = MCP.MIN_POS AND MCP.PRODUCTRANGE = MC.PRODUCTRANGE
AND MC.PRODUCTRANGE NOT IN (SELECT PRODUCTRANGE FROM MATCH_START)

- 首先匹配以单词开头的productRange,然后匹配containsint。

例如使用此数据: SELECT * FROM LINEITEM

LineItemId  Description
----------- --------------------------------------
1           Sony Headphones for a Sony DHJ232
2           Sony DHJ232 in blue
3           SANYO KI8767 with carry case
4           SANYO KI8767 with carry case 2
5           Sony Headphones for a Sony DHJ232 B

SELECT * FROM PRODUCT

ProductRange
----------------------
SANYO KI8767
Sony DHJ232
Sony Headphones

结果是

PRODUCTRANGE      DESCRIPTION                          LINEITEMID
---------------   -------------------------------------  -----------
SANYO KI8767      SANYO KI8767 with carry case            3
Sony DHJ232       Sony  DHJ232 in blue                    2
Sony Headphones   Sony Headphones for a Sony DHJ232       1

答案 2 :(得分:0)

就个人而言,我希望能够优先考虑选择哪个“范围”超过其序数位置;所以我实现了类似的东西: -

create table dbo.Sales (
    LineitemID int identity (1,1) not null primary key,
    [Description] varchar(50)
)
insert into dbo.Sales ([Description]) values ('Sony Headphones for a Sony DHJ232')
insert into dbo.Sales ([Description]) values ('Sony DHJ232 in blue')
insert into dbo.Sales ([Description]) values ('SANYO KI8767 with carry case')
insert into dbo.Sales ([Description]) values ('Sony Headphones for a Sony PS3')

create table dbo.ProductRange (
    ProductRangeId int identity (1,1) not null primary key,
    RangeName varchar(50),
    Significance int
)
insert into dbo.ProductRange (RangeName, Significance) values ('Sony DHJ232', 1)
insert into dbo.ProductRange (RangeName, Significance) values ('SANYO KI8767', 1)
insert into dbo.ProductRange (RangeName, Significance) values ('Sony Headphones', 2)
go
CREATE FUNCTION [dbo].GetRange
(
    @description varchar(50)
)
RETURNS INT
AS
BEGIN

    declare @ProductRangeId int

    select top 1 @ProductRangeId=pr.ProductRangeId
    from dbo.ProductRange pr
    where @description like '%'+pr.RangeName+'%'
    order by pr.Significance

    RETURN @ProductRangeId
END
go
select s.*, dbo.GetRange(s.Description) as RangeId
from dbo.Sales s

这将允许dbo。[ProductRange]中的[重要性]列指定当多个值是“命中”时返回的值。

这个的输出是: -

LineitemID  Description                                        RangeId
----------- -------------------------------------------------- -----------
1           Sony Headphones for a Sony DHJ232                  1
2           Sony DHJ232 in blue                                1
3           SANYO KI8767 with carry case                       2
4           Sony Headphones for a Sony PS3                     3

可以很容易地加入到dbo。[ProductRange]