我有一个姓氏表,我想计算A-D或E-H等每个字母范围内的姓氏数。
我想出了下面的查询,它有效,我希望听到人们对它的看法,也许还有更好的方法。
select count(*) FROM people
group by surname REGEXP '^[a-d].*',
surname REGEXP '^[e-h].*',
surname REGEXP '^[i-l].*',
surname REGEXP '^[m-p].*',
surname REGEXP '^[q-t].*',
surname REGEXP '^[u-z].*';
答案 0 :(得分:4)
这是实现这一目标的最佳方法(无论如何使用正则表达式):
select
sum(surname REGEXP '^[a-dA-D].*') as ad_count,
sum(surname REGEXP '^[e-hE-H].*') as eh_count,
sum(surname REGEXP '^[i-lI-L].*') as il_count,
sum(surname REGEXP '^[m-pM-P].*') as mp_count,
sum(surname REGEXP '^[q-tQ-T].*') as qd_count,
sum(surname REGEXP '^[u-zU-Z].*') as uz_count
from people
由于在mysql中,true
为1
且false
为0
,因此sum(some condition)
是如何优雅的简洁工作很多时候都是如此。
通过从内部选择中选择更有效地计算组的工作(例如,通过使用substr(surname,1,1)上的情况),然后在针对该计算的值的值的测试上求和,您将获得更好的性能。
答案 1 :(得分:2)
正则表达式过度,完全不需要。
也许是这样的,使用基本的字符串代数:
SELECT
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'a' AND 'd' THEN 1 ELSE 0 END) AS `SUM_a-d`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'e' AND 'h' THEN 1 ELSE 0 END) AS `SUM_e-h`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'i' AND 'l' THEN 1 ELSE 0 END) AS `SUM_i-l`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'm' AND 'p' THEN 1 ELSE 0 END) AS `SUM_m-p`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'q' AND 't' THEN 1 ELSE 0 END) AS `SUM_q-t`,
SUM(CASE WHEN SUBSTR(`surname`, 1, 1) BETWEEN 'u' AND 'z' THEN 1 ELSE 0 END) AS `SUM_u-z`
FROM `people`
答案 2 :(得分:0)
您可以使查询更明确,如下所示:
SELECT
SUM(CASE WHEN surname REGEXP '^[a-d].*' THEN 1 ELSE 0 END) AS a_d_count
,SUM(CASE WHEN surname REGEXP '^[e-h].*' THEN 1 ELSE 0 END) AS e_h_count
,SUM(CASE WHEN surname REGEXP '^[i-l].*' THEN 1 ELSE 0 END) AS i_l_count
,SUM(CASE WHEN surname REGEXP '^[m-p].*' THEN 1 ELSE 0 END) AS m_p_count
,SUM(CASE WHEN surname REGEXP '^[q-t].*' THEN 1 ELSE 0 END) AS q_t_count
,SUM(CASE WHEN surname REGEXP '^[u-z].*' THEN 1 ELSE 0 END) AS u_z_count
FROM (SELECT surname FROM people ORDER BY surname ASC) p
答案 3 :(得分:0)
避免使用正则表达式和条件,您可以这样做:
SELECT CONCAT(LEFT(UPPER(surname),1), '-', CHAR(ASCII(UPPER(surname))+3)) AS r,
count(id)
FROM people
GROUP BY ROUND((ASCII(UPPER(surname)-65)/4),0);
这会将你的范围设置为4个字母,这意味着最后一个范围是'yz',但你可以用更多的数学来调整它。