我目前正在处理医疗剂量数据。它是一个大数据集/ oracle表,具有包含数百万条记录的字符串变量。字符串变量如下所示:
this.state
这些是示例记录。我需要从此主字符串中找到MG(毫克)剂量并计算总和。例如:
Drug_Direction
(1 JAN) INJECT 2ML (100MG) IV/IM AM THEN 0.5ML (25MG) 20 MIN LATER, THEN 2.5ML (125MG) PM
(SEP 20, 2018) INJECT 0.3ML (30MG) ON S1, 0.6ML (60MG) ON S2 AND 2ML(200MG) ON S3
此外,字符串也不是固定格式。有时会有变化,例如仅存在2或1MG剂量。在这种情况下。我只需要得到那些MG剂量。我了解我可能需要计算MG发生次数,找到数字并求和。我正在同时工作。
在Oracle中也可以使用相同的数据。因此,如果在Oracle-sql中有更简便的方法可以做到这一点,那也是值得欢迎的。
答案 0 :(得分:2)
这在Oracle中相当容易做到。您可以:
REGEXP_COUNT
来计算每个字符串中MG值的出现次数CONNECT BY
为每个匹配项创建一行REGEXP_SUBSTR
来获取每个实际匹配项类似这样的东西:
WITH test_vals AS (
SELECT '(1 JAN) INJECT 2ML (100MG) IV/IM AM THEN 0.5ML (25MG) 20 MIN LATER, THEN 2.5ML (125MG) PM' AS drug_direction FROM dual
UNION ALL SELECT '(SEP 20, 2018) INJECT 0.3ML (30MG) ON S1, 0.6ML (60MG) ON S2 AND 2ML(200MG) ON S3' FROM dual
),
match_rows AS ( /* Get a row for each match */
SELECT DISTINCT
m.drug_direction,
LEVEL AS mg_occurrance_num
FROM test_vals m
CONNECT BY LEVEL <= REGEXP_COUNT(m.drug_direction, '((\d+\.)?\d+)MG') /* Count number of matches in each string */
)
SELECT r.drug_direction,
SUM(
TO_NUMBER(
REGEXP_SUBSTR(
r.drug_direction,
'((\d+\.)?\d+)MG',
1,
r.mg_occurrance_num, /* Search for this specific occurrance */
'',
1 /* Get first sub-group (the actual numeric value) */
)
)
) AS total_mg_value
FROM match_rows r
GROUP BY r.drug_direction
ORDER BY r.drug_direction
请注意,这假定所有值均采用该确切格式(数字值后跟字符串'MG')。
答案 1 :(得分:1)
假设它是单个文本字符串,那么在Oracle中,您可以使用多个递归子查询分解子句将字符串拆分为子字符串:
Oracle设置:
CREATE TABLE table_name ( id, value ) AS
SELECT 1, '(1 JAN) INJECT 2ML (100MG) IV/IM AM THEN 0.5ML (25MG) 20 MIN LATER, THEN 2.5ML (125MG) PM'
|| '(SEP 20, 2018) INJECT 0.3ML (30MG) ON S1, 0.6ML (60MG) ON S2 AND 2ML(200MG) ON S3' FROM DUAL;
查询:
WITH datelines ( id, value, dt, pos, lvl ) AS (
SELECT id,
value,
REGEXP_SUBSTR(
value,
'\((([0-2]?\d|3[01]) (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) ([0-2]?\d|3[01]), \d{4})\)',
1,
1,
NULL,
1
),
REGEXP_INSTR(
value,
'\((([0-2]?\d|3[01]) (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) ([0-2]?\d|3[01]), \d{4})\)',
1,
1
),
1
FROM table_name
UNION ALL
SELECT id,
value,
REGEXP_SUBSTR(
value,
'\((([0-2]?\d|3[01]) (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) ([0-2]?\d|3[01]), \d{4})\)',
1,
LVL + 1,
NULL,
1
),
REGEXP_INSTR(
value,
'\((([0-2]?\d|3[01]) (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) ([0-2]?\d|3[01]), \d{4})\)',
1,
LVL + 1
),
LVL + 1
FROM datelines
WHERE REGEXP_SUBSTR(
value,
'\((([0-2]?\d|3[01]) (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) ([0-2]?\d|3[01]), \d{4})\)',
1,
LVL + 1,
NULL,
1
) IS NOT NULL
),
actions ( id, dt, lvl, actions ) AS (
SELECT id,
dt,
lvl,
SUBSTR(
value,
pos + LENGTH( dt ) + 2,
LEAD( pos, 1, LENGTH( value ) + 1 ) OVER ( PARTITION BY id ORDER BY lvl ) - pos - LENGTH( dt ) - 2
)
FROM datelines
),
amounts ( id, dt, lvl, actions, amount, num_amounts, amount_lvl ) AS (
SELECT id,
dt,
lvl,
actions,
TO_NUMBER( REGEXP_SUBSTR( actions, '\((\d+)MG\)', 1, 1, NULL, 1 ) ),
REGEXP_COUNT( actions, '\((\d+)MG\)' ),
1
FROM actions
UNION ALL
SELECT id,
dt,
lvl,
actions,
TO_NUMBER( REGEXP_SUBSTR( actions, '\((\d+)MG\)', 1, amount_lvl + 1, NULL, 1 ) ),
num_amounts,
amount_lvl + 1
FROM amounts
WHERE amount_lvl < num_amounts
)
SELECT id,
dt,
SUM( amount ) AS total_amount
FROM amounts
GROUP BY id, dt, lvl;
输出:
ID | DT | TOTAL_AMOUNT -: | :----------- | -----------: 1 | SEP 20, 2018 | 290 1 | 1 JAN | 250
db <>提琴here
更新
如果每一行都在数据库表的不同行中,那么:
Oracle设置:
CREATE TABLE table_name ( id, value ) AS
SELECT 1, '(1 JAN) INJECT 2ML (100MG) IV/IM AM THEN 0.5ML (25MG) 20 MIN LATER, THEN 2.5ML (125MG) PM' FROM DUAL UNION ALL
SELECT 2, '(SEP 20, 2018) INJECT 0.3ML (30MG) ON S1, 0.6ML (60MG) ON S2 AND 2ML(200MG) ON S3' FROM DUAL;
查询:
WITH amounts ( id, value, dt, amount, amount_index, num_amounts ) AS (
SELECT id,
value,
REGEXP_SUBSTR( value, '\((.*?)\)', 1, 1, NULL, 1 ),
TO_NUMBER( REGEXP_SUBSTR( value, '\((\d+)MG\)', 1, 1, NULL, 1 ) ),
1,
REGEXP_COUNT( value, '\((\d+)MG\)' )
FROM table_name
UNION ALL
SELECT id,
value,
dt,
TO_NUMBER( REGEXP_SUBSTR( value, '\((\d+)MG\)', 1, amount_index + 1, NULL, 1 ) ),
amount_index + 1,
num_amounts
FROM amounts
WHERE amount_index < num_amounts
)
SELECT id,
MAX( dt ) AS dt,
SUM( amount ) AS total_amount
FROM amounts
GROUP BY id;
输出:
ID | DT | TOTAL_AMOUNT -: | :----------- | -----------: 1 | 1 JAN | 250 2 | SEP 20, 2018 | 290
db <>提琴here