我的表中有一个名为Description
的字段。以下是一些记录的示例:
+-----------+------------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------------+---------+---------+-------------------+---------------+-------------+-------------+
| RecordKey | RecordType | Price | Description | RecordNumber | DiscsinSet | Country | Company | DigitalAnalogCode | Genre | UPC | datecreated |
+-----------+------------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------------+---------+---------+-------------------+---------------+-------------+-------------+
| 100488 | CD | 5.99 | Korngold, Honegger, Verdi, Wagner, Puccini, Leoncavallo, Giordano: Opera Arias + 'I Know Where I'm Going'. (Ellen Faull, soprano. Taken from the Sylvan Levin Opera Concert Broadcasts of 1951 & 1952. Total time: 65'47') | VAIA 1173 | 1 | AMERICA | VAI | M | Songs & Arias | 89948117322 | 42:38.4 |
| 100503 | CD | 11.98 | Puccini, Madama Butterfly. (Kirsten, Barioni, Nadell et al. New Orleans Opera/ Cellini. Rec.3/60) | VAIA 1054-2 | 2 | AMERICA | VAI | A | Opera | 89948105428 | 42:38.4 |
| 100516 | MV | 8.99 | Brahms, 8 Gypsy Songs. Schumann, 2 Short Gypsy Songs. Liszt, The 3 Gypsies. Verdi, The Gypsy Woman. J.Strauss, 'Gypsy Baron'- Song of Sapphi + Other Gypsy Songs by Balakirev, Varlamov, Tchaikovsky, Verstovskij, Dvorak & Lehar. (Ljuba Kazarnovskaya, soprano w.Mark Morash, piano. Rec.Moscow, 2/19/98) | 69503 | 1 | AMERICA | VAI | S | NULL | 89948695035 | 42:38.4 |
+-----------+------------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------------+---------+---------+-------------------+---------------+-------------+-------------+
道歉,如果难以阅读,但description
字段有很多文字。
我想创建此字段中每个单词的频率分布。
我想要的输出看起来像这样:
+-----------+-------+
| word | count |
+-----------+-------+
| Beethoven | 344 |
| Strauss | 34533 |
| Piano | 3 |
| Webber | 34 |
+-----------+-------+
如果更有意义,您能否指出我如何通过SSAS实现这一目标?
答案 0 :(得分:1)
如果你有一个单独的有效单词列表,你可以这样做:
select w.word, count(*)
from mytable t join
words w
on ', ' + w.word + ', ' like '%, ' + t.description + ', %'
group by w.word;
如果不这样做,请浏览网页以查找split()
功能。然后,您可以使用cross apply
来执行以下操作:
select w.value, count(*)
from mytable t cross apply
(select *
from split(t.description, ', ')
) w
group by w.value;
如果你掌握了数据结构,那就调皮,顽皮。 SQL具有用于存储列表的精彩数据结构。它被称为表。它不是一个字符串。您应该使用联结表 - 如果您有控制权。但是,人们并不总能控制这些问题。