我的表格中有一列条件,每行包含类似的文字: -
Inclusion Criteria:
- Female
- > 40 years of age
- Women who have first-degree relative suffered from breast cancer
- Women who have first-degree relative suffered from ovarian cancer
- Family history of male breast cancer
- Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.
- Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members
- Personal history of ovarian cancer
- Personal history of premalignant conditions of breast and ovary
Exclusion Criteria:
- Women with mammogram within one year
- adults aged 50-75
我需要找出PostgreSQL中包含和排除标准的计数。例如,包含标准为9,排除标准为2。
答案 0 :(得分:1)
您可以使用将执行解析和分离的PL / pgSQL创建存储过程。一旦你得到它,你可以通过SELECT
在字符串或单元格上调用它,就像你使用任何其他PostgreSQL函数一样。
如果要在一个操作中返回两个值(包含和排除),最简单的方法是创建一个定义其名称和类型的表,如下所示:
CREATE TABLE condition_counts (
num_of_inclusions VARCHAR,
num_of_exclusions VARCHAR
);
然后,您可以在存储过程定义中使用它,如下所示:
CREATE OR REPLACE FUNCTION parse_conditions(conditions VARCHAR) RETURNS condition_counts AS $$
DECLARE
condition_matches VARCHAR[2];
inclusion_count INTEGER;
exclusion_count INTEGER;
parsed_conditions condition_counts%ROWTYPE;
BEGIN
condition_matches = regexp_matches(conditions,
E'^Inclusion Criteria:\\s*(.*)\\s*Exclusion Criteria:\\s*(.*)$');
SELECT array_length(regexp_split_to_array(condition_matches[1], E'\\n\\s*-\\s*'), 1),
array_length(regexp_split_to_array(condition_matches[2], E'\\n\\s*-\\s*'), 1)
INTO parsed_conditions.num_of_inclusions, parsed_conditions.num_of_exclusions;
return parsed_conditions;
END
$$ LANGUAGE plpgsql;
您现在可以在您提供的示例字符串上调用它,如下所示:
SELECT * FROM parse_conditions('Inclusion Criteria:
- Female
- > 40 years of age
- Women who have first-degree relative suffered from breast cancer
- Women who have first-degree relative suffered from ovarian cancer
- Family history of male breast cancer
- Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.
- Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members
- Personal history of ovarian cancer
- Personal history of premalignant conditions of breast and ovary
Exclusion Criteria:
- Women with mammogram within one year
- adults aged 50-75');
将按预期返回9和2的计数。您也可以像PostgreSQL函数一样执行SELECT parse_conditions(columnname) FROM tablename;
和其他各种组合。
答案 1 :(得分:0)
您是说所有上述内容都发生在一个列中?
如果是这样,您可以使用正则表达式模式匹配,您可以从字符串'包含标准中搜索:'直到字符串'排除标准:'并计算之间的行数。
正则表达式非常适合你。 Documentation Here.