根据PostgreSQL中字符串中的某些条件计算值?

时间:2017-07-26 14:43:34

标签: string postgresql

我的表格中有一列条件,每行包含类似的文字: -

Inclusion Criteria:

-  Female

-  > 40 years of age

-  Women who have first-degree relative suffered from breast cancer

-  Women who have first-degree relative suffered from ovarian cancer

-  Family history of male breast cancer

-  Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.

-  Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members

-  Personal history of ovarian cancer

-  Personal history of premalignant conditions of breast and ovary

Exclusion Criteria:

     - Women with mammogram within one year
     -  adults aged 50-75

我需要找出PostgreSQL中包含和排除标准的计数。例如,包含标准为9,排除标准为2。

2 个答案:

答案 0 :(得分:1)

您可以使用将执行解析和分离的PL / pgSQL创建存储过程。一旦你得到它,你可以通过SELECT在字符串或单元格上调用它,就像你使用任何其他PostgreSQL函数一样。

如果要在一个操作中返回两个值(包含和排除),最简单的方法是创建一个定义其名称和类型的表,如下所示:

CREATE TABLE condition_counts (
  num_of_inclusions VARCHAR,
  num_of_exclusions VARCHAR
);

然后,您可以在存储过程定义中使用它,如下所示:

CREATE OR REPLACE FUNCTION parse_conditions(conditions VARCHAR) RETURNS condition_counts AS $$
DECLARE
    condition_matches VARCHAR[2];
    inclusion_count INTEGER;
    exclusion_count INTEGER;
    parsed_conditions condition_counts%ROWTYPE;
BEGIN
    condition_matches = regexp_matches(conditions,
        E'^Inclusion Criteria:\\s*(.*)\\s*Exclusion Criteria:\\s*(.*)$');
    SELECT array_length(regexp_split_to_array(condition_matches[1], E'\\n\\s*-\\s*'), 1),
           array_length(regexp_split_to_array(condition_matches[2], E'\\n\\s*-\\s*'), 1)
      INTO parsed_conditions.num_of_inclusions, parsed_conditions.num_of_exclusions;
    return parsed_conditions;
END
$$ LANGUAGE plpgsql;

您现在可以在您提供的示例字符串上调用它,如下所示:

SELECT * FROM parse_conditions('Inclusion Criteria:

-  Female

-  > 40 years of age

-  Women who have first-degree relative suffered from breast cancer

-  Women who have first-degree relative suffered from ovarian cancer

-  Family history of male breast cancer

-  Family history of breast cancer (not necessarily first degree relatives) diagnosed before age of 40.

-  Family history of breast cancer (not necessarily first degree relatives) affecting 2 or more family members

-  Personal history of ovarian cancer

-  Personal history of premalignant conditions of breast and ovary

Exclusion Criteria:

     - Women with mammogram within one year
     -  adults aged 50-75');

将按预期返回9和2的计数。您也可以像PostgreSQL函数一样执行SELECT parse_conditions(columnname) FROM tablename;和其他各种组合。

答案 1 :(得分:0)

您是说所有上述内容都发生在一个列中?

如果是这样,您可以使用正则表达式模式匹配,您可以从字符串'包含标准中搜索:'直到字符串'排除标准:'并计算之间的行数。

正则表达式非常适合你。 Documentation Here.