例如,我有一个名为'Table1'的表。和列称为“国家/地区”。 我想要计算string中字的值。贝洛是我的列'country'的数据:
country:
"japan singapore japan chinese chinese chinese"
预期输出:在上述数据中我们可以看到日本出现两次,新加坡出现一次,中国出现3次。我想计算日本数量为一,新加坡为一,中国为一的词。因此,输出将是3.请帮助我
ValueOfWord: 3
答案 0 :(得分:1)
首先,将单个列中的多个值存储为分隔字符串是一种糟糕的设计。您应该将规范化数据视为永久解决方案。
使用非规范化数据,您可以使用 REGEXP_SUBSTR 在单个SQL中执行此操作:
SELECT COUNT(DISTINCT(regexp_substr(country, '[^ ]+', 1, LEVEL))) as "COUNT"
FROM table_name
CONNECT BY LEVEL <= regexp_count(country, ' ')+1
/
<强>演示:强>
SQL> WITH sample_data AS
2 ( SELECT 'japan singapore japan chinese chinese chinese' str FROM dual
3 )
4 -- end of sample_data mocking real table
5 SELECT COUNT(DISTINCT(regexp_substr(str, '[^ ]+', 1, LEVEL))) as "COUNT"
6 FROM sample_data
7 CONNECT BY LEVEL <= regexp_count(str, ' ')+1
8 /
COUNT
----------
3
请参阅Split single comma delimited string into rows in Oracle以了解查询的工作原理。
<强>更新强>
对于多个分隔的字符串行,您需要注意 CONNECT BY 子句形成的行数。
有关执行相同任务的更多方法,请参阅Split comma delimited strings in a table in Oracle。
<强>设置强>
我们假设您有一个包含3行的表格,如下所示:
SQL> CREATE TABLE t(country VARCHAR2(200));
Table created.
SQL> INSERT INTO t VALUES('japan singapore japan chinese chinese chinese');
1 row created.
SQL> INSERT INTO t VALUES('singapore indian malaysia');
1 row created.
SQL> INSERT INTO t VALUES('french french french');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> SELECT * FROM t;
COUNTRY
---------------------------------------------------------------------------
japan singapore japan chinese chinese chinese
singapore indian malaysia
french french french
我们希望输出为6
,因为有6个唯一的字符串。
SQL> SELECT COUNT(DISTINCT(regexp_substr(t.country, '[^ ]+', 1, lines.column_value))) count
2 FROM t,
3 TABLE (CAST (MULTISET
4 (SELECT LEVEL FROM dual
5 CONNECT BY LEVEL <= regexp_count(t.country, ' ')+1
6 ) AS sys.odciNumberList ) ) lines
7 ORDER BY lines.column_value
8 /
COUNT
----------
6
还有许多其他方法可以实现所需的输出。我们来看看如何:
SQL> SELECT COUNT(DISTINCT(country)) COUNT 2 FROM 3 (SELECT trim(COLUMN_VALUE) country 4 FROM t, 5 xmltable(('"' 6 || REPLACE(country, ' ', '","') 7 || '"')) 8 ) 9 / COUNT ---------- 6
SQL> WITH 2 model_param AS 3 ( 4 SELECT country AS orig_str , 5 ' ' 6 || country 7 || ' ' AS mod_str , 8 1 AS start_pos , 9 Length(country) AS end_pos , 10 (LENGTH(country) - 11 LENGTH(REPLACE(country, ' '))) + 1 AS element_count , 12 0 AS element_no , 13 ROWNUM AS rn 14 FROM t ) 15 SELECT COUNT(DISTINCT(Substr(mod_str, start_pos, end_pos-start_pos))) count 16 FROM ( 17 SELECT * 18 FROM model_param 19 MODEL PARTITION BY (rn, orig_str, mod_str) 20 DIMENSION BY (element_no) 21 MEASURES (start_pos, end_pos, element_count) 22 RULES ITERATE (2000) 23 UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24 ( start_pos[ITERATION_NUMBER+1] = 25 instr(cv(mod_str), ' ', 1, cv(element_no)) + 1, 26 end_pos[ITERATION_NUMBER+1] = 27 instr(cv(mod_str), ' ', 1, cv(element_no) + 1) ) 28 ) 29 WHERE element_no != 0 30 ORDER BY mod_str , element_no 31 / COUNT ---------- 6
答案 1 :(得分:0)
您是否将这种字符串存储在一个条目中?
如果没有,请尝试
SELECT COUNT(*)
FROM (SELECT DISTINCT T.country FROM Table1 T)
如果是,我会写一个外部程序来解析字符串并返回你想要的结果。
喜欢使用java。
创建一个字符串集。
我会使用JDBC来检索记录,并使用split来分割标记中的字符串,使用&#39; &#39;分隔符。对于每个标记,如果它不在集合中,则将其添加到集合中。
解析完成后,获取集合的长度,这是您想要的值。
答案 2 :(得分:0)
根据空格分隔符
中断字符串SELECT COUNT(DISTINCT regexp_substr(col, '[^ ]+', 1, LEVEL))
FROM T
CONNECT BY LEVEL <= regexp_count(col, ' ')+1
用于计算DISTINCT字
SELECT col,
COUNT(DISTINCT regexp_substr(col, '[^ ]+', 1, LEVEL))
FROM T
CONNECT BY LEVEL <= regexp_count(col, ' ')+1
GROUP BY col