我有一个这样的查询,查看不同网址的负载,并按主机名对它们进行分组。它非常难看,但似乎足够快。
如何编写它以便丑陋的子串(抓取域的第一部分)以更整洁的方式编写?我正在从一系列社交媒体网站生成查询,因此可能会有更多网站。
SELECT substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) AS referer_domain,
count(USER) AS hits,
r.id
FROM core c,
referer r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND (substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "www.delicious.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "www.facebook.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "m.facebook.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "www.reddit.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "twitter.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "news.ycombinator.com"
GROUP BY substring(r.name, 8, locate("/",substring(r.name FROM 8))-1)
ORDER BY hits DESC
答案 0 :(得分:2)
在您的情况下,您已经创建了一个输出列referer_domain,您可以参考GROUP BY。
为了在WHERE子句中使用它,尽管你需要一个视图。
CREATE VIEW ref_domain_view AS SELECT *,substring(name, 8, locate("/",substring(name FROM 8))-1) as referer_domain FROM referer;
SELECT r.referer_domain,
count(USER) AS hits,
r.id
FROM core c,
ref_domain_view r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND referer_domain = "www.delicious.com"
OR referer_domain = "www.facebook.com"
OR referer_domain = "m.facebook.com"
OR referer_domain = "www.reddit.com"
OR referer_domain = "twitter.com"
OR referer_domain = "news.ycombinator.com"
GROUP BY referer_domain
ORDER BY hits DESC
答案 1 :(得分:0)
您可以使用CREATE FUNCTION
。
答案 2 :(得分:0)
这样做的一种方法是编写一个用于字符串提取的user-defined function (UDF)。 UDF速度很快,但写起来却更加痛苦。另一种方法是编写一个stored function类型的存储过程,它更易于编写,更符合SQL语法。
答案 3 :(得分:0)
虽然以上两个答案都有很好的观点,但我必须问你一些事情。
你真的需要所有的子串函数调用吗?
你有没有试过这样的事情:
SELECT substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) AS referer_domain,
count(USER) AS hits,
r.id
FROM core c,
referer r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND r.name = "http://www.delicious.com"
OR r.name = "http://www.facebook.com"
OR r.name = "http://m.facebook.com"
OR r.name = "http://www.reddit.com"
OR r.name = "http://twitter.com"
OR r.name = "http://news.ycombinator.com"
GROUP BY substring(r.name, 8, locate("/",substring(r.name FROM 8))-1)
ORDER BY hits DESC
上述查询的唯一问题是,如果您需要更多域来跟踪您需要更改查询。如果将域放入表中并加入到该表中,则可能完全摆脱子字符串,也不必更改查询。