Teradata - 列中的单词频率

时间:2014-10-14 15:32:22

标签: sql teradata word-frequency

说我在teradata表中有以下列:

Red ball
Purple ball
Orange ball

我希望我的输出

Word    Count
Red     1
Ball    3
Purple  1
Orange  1

感谢。

3 个答案:

答案 0 :(得分:4)

在TD14中有一个STRTOK_SPLIT_TO_TABLE函数:

SELECT token, COUNT(*)
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1 -- this is just a dummy, usually the PK column when you need to join
                                 ,table.stringcolumn
                                 ,' ') -- simply add other separating characters
     RETURNS (outkey INTEGER,
              tokennum INTEGER,
              token VARCHAR(100) CHARACTER SET UNICODE
             )
           ) AS d
GROUP BY 1

答案 1 :(得分:1)

以下是我将如何处理这样的事情:

  WITH RECURSIVE CTE  (POS, NEW_STRING, REAL_STRING) AS
(
SELECT
0, CAST('' AS VARCHAR(100)),TRIM(word)
FROM wordcount
UNION ALL
SELECT
CASE WHEN POSITION(' ' IN REAL_STRING) > 0
THEN POSITION(' ' IN REAL_STRING)
ELSE CHARACTER_LENGTH(REAL_STRING)
END DPOS,
TRIM(BOTH  ' ' FROM SUBSTR(REAL_STRING, 0, DPOS+1)),
TRIM(SUBSTR(REAL_STRING, DPOS+1))
FROM CTE
WHERE DPOS > 0
)

SELECT TRIM(NEW_STRING) as word,
count (*)
FROM CTE
group by word
WHERE pos > 0;

将返回:

    word    Count(*)
    orange  1
    purple  1
    red 1
    ball    3

14岁的正则表达式可能有一种更简单的方法,但我还没有弄清楚它。

编辑:从查询中删除了一些不需要的列。

答案 2 :(得分:0)

使用此

更改您的表格
name  |name2
_______________
red    |  ball
purple |  ball
orange |  ball
_______________

然后运行以下查询:

select name, count(name)as name1_count from table_test
group by name
union all
select name2,count(name2)as name2_count from table_test
group by name2;