说我有以下数据集
Column1 (VarChar(50 or something))
Elias
Sails
Pails
Plane
Games
我想从本专栏中提供以下内容:
LETTER COUNT
E 3
L 4
I 3
A 5
S 5
And So On...
我想到的一个解决方案是将所有字符串组合成一个字符串,然后计算该字符串中每个字母的实例,但这感觉很草率。
这更像是一种好奇心而不是其他任何东西,但是,有没有办法用SQL来计算数据集中所有不同字母的数量?
答案 0 :(得分:2)
我会通过创建一个类似于:
的字母表来完成此操作CREATE TABLE tblLetter
(
letter varchar(1)
);
INSERT INTO tblLetter ([letter])
VALUES
('a'),
('b'),
('c'),
('d'); -- etc
然后,您可以将letters
加入到您的表中,其中您的数据就像信件一样:
select l.letter, count(n.col) Total
from tblLetter l
inner join names n
on n.col like '%'+l.letter+'%'
group by l.letter;
见SQL Fiddle with Demo。这会得到一个结果:
| LETTER | TOTAL |
|--------|-------|
| a | 5 |
| e | 3 |
| g | 1 |
| i | 3 |
| l | 4 |
| m | 1 |
| p | 2 |
| s | 4 |
答案 1 :(得分:1)
如果您创建一个字母表,如下所示:
create table letter (ch char(1));
insert into letter(ch) values ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H')
,('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P')
,('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z');
你可以用交叉连接来做,像这样:
select ch, SUM(len(str) - len(replace(str,ch,'')))
from letter
cross join test -- <<== test is the name of the table with the string
group by ch
having SUM(len(str) - len(replace(str,ch,''))) <> 0
Here is a running demo on sqlfiddle.
您可以通过将一个字母列表嵌入查询本身来定义表格,但是通过字母进行交叉连接和分组的想法将保持不变。
注意:see this answer用于解释SUM
内的表达式。
答案 2 :(得分:1)
对我而言,这是一个几乎为CTE量身定制的问题(谢谢,Nicholas Carey,原着,我的小提琴:http://sqlfiddle.com/#!3/44f77/8):
WITH cteLetters
AS
(
SELECT
1 AS CharPos,
str,
MAX(LEN(str)) AS MaxLen,
SUBSTRING(str, 1, 1) AS Letter
FROM
test
GROUP BY
str,
SUBSTRING(str, 1, 1)
UNION ALL
SELECT
CharPos + 1,
str,
MaxLen,
SUBSTRING(str, CharPos + 1, 1) AS Letter
FROM
cteLetters
WHERE
CharPos + 1 <= MaxLen
)
SELECT
UPPER(Letter) AS Letter,
COUNT(*) CountOfLetters
FROM
cteLetters
GROUP BY
Letter
ORDER BY
Letter;
使用CTE计算字符位置并解构每个字符串。然后你可以从CTE本身汇总。不需要额外的桌子或任何东西。
答案 3 :(得分:0)
即使您已启用区分大小写,这也应该有效。
设置:
CREATE TABLE _test ( Column1 VARCHAR (50) )
INSERT _test (Column1) VALUES ('Elias'),('Sails'),('Pails'),('Plane'),('Games')
工作:
DECLARE @counter AS INT
DECLARE @results TABLE (LETTER VARCHAR(1),[COUNT] INT)
SET @counter=65 --ascii value for 'A'
WHILE ( @counter <=90 ) -- ascii value for 'Z'
BEGIN
INSERT @results (LETTER,[COUNT])
SELECT CHAR(@counter),SUM(LEN(UPPER(Column1)) - LEN(REPLACE(UPPER(Column1), CHAR(@counter),''))) FROM _test
SET @counter=@counter+1
END
SELECT * FROM @results WHERE [Count]>0
答案 4 :(得分:0)
拥有一个范围或序列表通常很有用,它可以为您提供大量连续序列号的来源,例如覆盖-100,000- + 100,000的范围。
drop table dbo.range
go
create table dbo.range
(
id int not null primary key clustered ,
)
go
set nocount on
go
declare @i int = -100000
while ( @i <= +100000 )
begin
if ( @i > 0 and @i % 1000 = 0 ) print convert(varchar,@i) + ' rows'
insert dbo.range values ( @i )
set @i = @i + 1
end
go
set nocount off
go
一旦你有了这样的表,就可以这样做:
select character = substring( t.some_column , r.id , 1 ) ,
frequency = count(*)
from dbo.some_table t
join dbo.range r on r.id between 1 and len( t.some_column )
group by substring( t.some_column , r.id , 1 )
order by 1
如果您想确保不区分大小写,只需混合所需的upper()
或lower()
:
select character = upper( substring( t.some_column , r.id , 1 ) ) ,
frequency = count(*)
from dbo.some_table t
join dbo.range r on r.id between 1 and len( t.some_column )
group by upper( substring( t.some_column , r.id , 1 ) )
order by 1
给出您的样本数据:
create table dbo.some_table
(
some_column varchar(50) not null
)
go
insert dbo.some_table values ( 'Elias' )
insert dbo.some_table values ( 'Sails' )
insert dbo.some_table values ( 'Pails' )
insert dbo.some_table values ( 'Plane' )
insert dbo.some_table values ( 'Games' )
go
上面的查询产生以下结果:
character frequency
A 5
E 3
G 1
I 3
L 4
M 1
N 1
P 2
S 5