如何标记SQL Server列以用于SSAS中的频率分布

时间:2014-08-15 13:01:12

标签: sql sql-server sql-server-2012 statistics ssas

我的表中有一个名为Description的字段。以下是一些记录的示例:

+-----------+------------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------------+---------+---------+-------------------+---------------+-------------+-------------+
| RecordKey | RecordType | Price |                                                                                                                                                 Description                                                                                                                                                 | RecordNumber | DiscsinSet | Country | Company | DigitalAnalogCode |     Genre     |     UPC     | datecreated |
+-----------+------------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------------+---------+---------+-------------------+---------------+-------------+-------------+
|    100488 | CD         | 5.99  | Korngold, Honegger, Verdi, Wagner, Puccini, Leoncavallo, Giordano: Opera Arias + 'I Know Where I'm Going'. (Ellen Faull, soprano. Taken from the Sylvan Levin Opera Concert Broadcasts of 1951 & 1952. Total time: 65'47')                                                                                  | VAIA 1173    |          1 | AMERICA | VAI     | M                 | Songs & Arias | 89948117322 | 42:38.4     |
|    100503 | CD         | 11.98 | Puccini, Madama Butterfly. (Kirsten, Barioni, Nadell et al. New Orleans Opera/ Cellini. Rec.3/60)                                                                                                                                                                                                           | VAIA 1054-2  |          2 | AMERICA | VAI     | A                 | Opera         | 89948105428 | 42:38.4     |
|    100516 | MV         | 8.99  | Brahms, 8 Gypsy Songs. Schumann, 2 Short Gypsy Songs. Liszt, The 3 Gypsies. Verdi, The Gypsy Woman. J.Strauss, 'Gypsy Baron'- Song of Sapphi + Other Gypsy Songs by Balakirev, Varlamov, Tchaikovsky, Verstovskij, Dvorak & Lehar. (Ljuba Kazarnovskaya, soprano w.Mark Morash, piano. Rec.Moscow, 2/19/98) | 69503        |          1 | AMERICA | VAI     | S                 | NULL          | 89948695035 | 42:38.4     |
+-----------+------------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------------+---------+---------+-------------------+---------------+-------------+-------------+

道歉,如果难以阅读,但description字段有很多文字。

我想创建此字段中每个单词的频率分布。

我想要的输出看起来像这样:

+-----------+-------+
|   word    | count |
+-----------+-------+
| Beethoven |   344 |
| Strauss   | 34533 |
| Piano     |     3 |
| Webber    |    34 |
+-----------+-------+

如果更有意义,您能否指出我如何通过SSAS实现这一目标?

1 个答案:

答案 0 :(得分:1)

如果你有一个单独的有效单词列表,你可以这样做:

select w.word, count(*)
from mytable t join
     words w
     on ', ' + w.word + ', ' like '%, ' + t.description + ', %'
group by w.word;

如果不这样做,请浏览网页以查找split()功能。然后,您可以使用cross apply来执行以下操作:

select w.value, count(*)
from mytable t cross apply
     (select *
      from split(t.description, ', ')
     ) w
group by w.value;

如果你掌握了数据结构,那就调皮,顽皮。 SQL具有用于存储列表的精彩数据结构。它被称为表。它不是一个字符串。您应该使用联结表 - 如果您有控制权。但是,人们并不总能控制这些问题。