MySQL查询以获取最匹配的行

时间:2018-12-20 17:14:17

标签: mysql

具有“ domain_keywords”的基本表结构

Field<StringGraphType>(
          name: "hallo",
          resolve: c =>
          {
              var userPrinc = (ClaimsPrincipal)c.UserContext;
              var allowed = userPrinc.Claims.Any(x => x.Type == "Role" && x.Value == "Admin" || x.Value == "Mod");
              if (!allowed)
              {
                  throw new Exception("TODO: Make this a 401 FORBIDDEN");
              }
              return "World";
          }

所以我想做的是查询一个域名,例如“ golflink.com”,并找到其他具有匹配关键字的域名,并获得匹配度最高的10个域名。

已经尝试了以下方法,并且可以在MySQL 8上运行,但是我们的共享Web服务器(读取:无法升级)使用MySQL 5.7,因此不允许使用“ with”子句,并且需要执行此查询的另一种方法:

domain          keyword
golflink.com    golf courses
golflink.com    golf
golflink.com    golf courses near me
...
2ndswing.com    golf clubs
2ndswing.com    used golf clubs
...

2 个答案:

答案 0 :(得分:1)

第一种方法是在FROM子句中使用派生表,类似于在WHIT子句中使用的派生表,请尝试以下操作:

SELECT
    t1.domain AS 'Domain',
    t2.domain AS 'SimilarDomain',
    count(t2.keyword) AS 'SharedKeywordsNumber'
FROM
    ( SELECT
          keyword, domain
      FROM
          domain_keywords
      WHERE
          domain ='golflink.com' 
      GROUP BY
          domain, keyword ) AS t1
CROSS JOIN
    domain_keywords AS t2
WHERE
    t1.keyword = t2.keyword AND t1.domain != t2.domain
GROUP BY
    t1.domain, t2.domain
ORDER BY
    3 DESC, 2 LIMIT 10

作为一种改进,我认为您也可以用这种方式CROSS JOIN来代替INNER JOIN(但不能100%确定):

SELECT
    t1.domain AS 'Domain',
    t2.domain AS 'SimilarDomain',
    count(t2.keyword) AS 'SharedKeywordsNumber'
FROM
    ( SELECT
          keyword, domain
      FROM
          domain_keywords
      WHERE
          domain ='golflink.com' 
      GROUP BY
          domain, keyword ) AS t1
INNER JOIN
    domain_keywords AS t2 ON t1.keyword = t2.keyword AND t1.domain != t2.domain
GROUP BY
    t1.domain, t2.domain
ORDER BY
    3 DESC, 2 LIMIT 10

答案 1 :(得分:1)

实际上,不需要CTE甚至子查询,隐式/逗号联接表示法是过时的,对于除最基本的一次性查询之外的所有查询都应使用显式联接。 (此外,除非这是查询中的转录错误,否则不应在任何版本的MySQL中运行;“不用于分隔标识符;`用于此。”仅用于分隔字符串。)

SELECT t1.domain AS `Domain`, t2.domain AS `SimilarDomain`, COUNT(*) AS `SharedKeywordsNumber`
FROM domain_keywords AS t1
INNER JOIN domain_keywords AS t2 ON t1.keyword = t2.keyword AND t1.domain != t2.domain
WHERE t1.domain ='golflink.com' 
GROUP BY t1.domain, t2.domain 
ORDER BY `SharedKeywordsNumber` DESC, `SimilarDomain`
LIMIT 10
;

另一个注释:ORDER BY field_ordinal格式已被弃用多年,因为它使查询变得不必要地难以阅读或修改。