SQL-如何实现类别的评分和匹配?

时间:2019-05-01 16:39:38

标签: python sql

我正在尝试在MySQL中实现评分和匹配。我有一张表,其中包含3级的类别,分别为:级别1(最终父级),级别2(父级)和级别3(子级)。 我正在尝试为从外部收到的每个新数据类别分配一个分数,并将其分配给表score += r.score * weight

中的特定类别ID
+----+------------------------------------+-----------------------------+--------------------+
| ID | LEVEL1                             | LEVEL2                      | LEVEL3             |
+----+------------------------------------+-----------------------------+--------------------+
| 1  |  Arts and Entertainment Businesses |  Casinos                    |  NULL              |
| 1  |  Arts and Entertainment Businesses |  Performing Arts Businesses |  Radio Stations    |
| 2  |  Auto Sales Businesses             |  Motorcycle Dealers         |  Motorcycle Parts  |
| 2  |  Auto Sales Businesses             |  RVs and Motor Home Dealers |  NULL              |
| 2  |  Auto Sales Businesses             |  Car Dealers                |  Used Cars Dealers |
| 3  |  Bars and Lounges                  |  Pubs and Dive Bars         |  Pubs              |
| 3  |  Bars and Lounges                  |  Wine Bars                  |  NULL              |
| 4  |  Restaurants                       |  American Restaurants       |  Barbeque          |
+----+------------------------------------+-----------------------------+--------------------+

上面是我的主表,其中包含类别。

我要做什么:

If input = 'Radio',
   Then match to LEVEL3 'Radio Station' with score less than 1.0
If LEVEL3 is NULL, Move up to LEVEL2
   Then Match to LEVEL2 
IF LEVEL2 is NULL, Move up to LEVEl1
   Then Match to LEVEL1

scores: 0.0(No Match) to 1.00 (Exact match)
        0.8 - 0.99 (Very good Match)

我正在尝试计算所有输入变量的接近度得分,然后为其分配ID。如果level3和level2中没有数据,则每个LEVEL1都有一行,其中LEVEL2和LEVEL3为Other,这将是最低的得分匹配。

我真的想在不使用Python ML / AI并对其过度设计的情况下包装这是SQL。 (如果在SQL中是不可能的,那么我将使用Python) 任何想法都会有所帮助。

预期结果:

Input = Used Cars
Output = [ID: 2 ,LEVEL1 : Auto Sales Business]

请参阅:,我了解这可能不完全是技术问题/代码错误。我确实理解不发布讨论的stackoverflow规则,而是专注于获得答案。任何指针/ SQL代码/ Python脚本都会对我有很大帮助。谢谢。

1 个答案:

答案 0 :(得分:1)

这是tsql中使用字符长度分配分数的一个选项。-

DECLARE @imput varchar(300) = 'Radio';

WITH Data AS (
SELECT  1  as id,  'Arts and Entertainment Businesses' AS Level1,  'Casinos'                    AS Level2,  NULL               AS Level3 Union
SELECT  1  as id,  'Arts and Entertainment Businesses' AS Level1,  'Performing Arts Businesses' AS Level2,  'Radio Stations'   AS Level3 Union
SELECT  2  as id,  'Auto Sales Businesses'             AS Level1,  'Motorcycle Dealers'         AS Level2,  'Motorcycle Parts' AS Level3 Union
SELECT  2  as id,  'Auto Sales Businesses'             AS Level1,  'RVs and Motor Home Dealers' AS Level2,  NULL               AS Level3 Union
SELECT  2  as id,  'Auto Sales Businesses'             AS Level1,  'Car Dealers'                AS Level2,  'Used Cars Dealers'AS Level3 Union
SELECT  3  as id,  'Bars and Lounges'                  AS Level1,  'Pubs and Dive Bars'         AS Level2,  'Pubs'             AS Level3 Union
SELECT  3  as id,  'Bars and Lounges'                  AS Level1,  'Wine Bars'                  AS Level2,  NULL               AS Level3 Union
SELECT  4  as id,  'Restaurants'                       AS Level1,  'American Restaurants'       AS Level2,  'Barbeque'         AS Level3 
)

SELECT * 
    ,CAST(Len(@imput) AS numeric(18,2))/Len(COALESCE(Level3,Level2,Level1)) AS Score 
FROM data
WHERE COALESCE(Level3,Level2,Level1) LIKE '%'+ @imput + '%'