PostgreSQL:仅在存在多个分隔符时,如何在最后一个分隔符后修剪字符串

时间:2019-03-25 11:07:10

标签: postgresql

我有这些条目:

id  |  fooddescription
--------------------
1   |  'Mollusks, oyster, eastern (blue point), wild, raw'
2   |  'Mollusks, oyster, eastern (blue point), wild, boiled or steamed'
3   |  'Vegetable oil, olive'
4   |  'Vegetable oil, almond'
5   |  'Pumpkin, boiled, drained, with salt'
6   |  'Pumpkin leaves, boiled, drained, with salt'

我想将前两个条目视为一个,因为它们只有不同的准备方法,而其他的却有所不同。字符串中的单词按从一般到特定的顺序排列,最后一部分(当有很多描述和,时)通常是不需要区别的准备方法。

所需结果:

id  |  fooddescription
--------------------
1   |  'Mollusks, oyster, eastern (blue point), wild, '
3   |  'Vegetable oil, olive'
4   |  'Vegetable oil, almond'
5   |  'Pumpkin, boiled, drained, '
6   |  'Pumpkin leaves, boiled, drained, '

首先,我想我可以修剪字符串以除去最后一个逗号之后的部分。因此,根据此MySQL answer,我编写了一个postgres脚本:

SELECT reverse(
            substring(reverse(fooddescription),
                      position(',' in reverse(fooddescription)))) as trimmed, count(*)
FROM food_name
GROUP BY trimmed HAVING COUNT(*)>0 

我将得到以下结果:

'Mollusks, oyster, eastern (blue point), wild,'
'Vegetable oil,'
'Pumpkin, boiled, drained,'
'Pumpkin leaves, boiled, drained,'

“蔬菜油”是不可取的,我不能保留id

所以我的问题是:

  1. 如何判断定界符,的数量,并且如果有多个定界符,如何仅修剪最后一部分?
  2. 还可以在id之后为每个组保留一个GROUP BY吗?

2 个答案:

答案 0 :(得分:2)

代替基于位置的子字符串,您可以将文本拆分为数组并计算元素数。

这是一个完整的示例:

WITH food_name (fooddescription) AS (
VALUES
  ('Mollusks, oyster, eastern (blue point), wild, raw'),
  ('Mollusks, oyster, eastern (blue point), wild, boiled or steamed'),
  ('Vegetable oil, olive'),
  ('Vegetable oil, almond'),
  ('Pumpkin, boiled, drained, with salt'), 
  ('Pumpkin leaves, boiled, drained, with salt')
)
SELECT ARRAY_TO_STRING(trimmed.trimmed, ', ')
FROM food_name
, LATERAL (SELECT STRING_TO_ARRAY(fooddescription, ', ') parts) parts
, LATERAL (SELECT CASE WHEN array_length(parts, 1) <= 2 THEN parts ELSE parts[1:array_length(parts, 1)-1] END trimmed) trimmed

这将返回以下结果集:

                trimmed
Mollusks, oyster, eastern (blue point), wild
Mollusks, oyster, eastern (blue point), wild
Vegetable oil, olive
Vegetable oil, almond
Pumpkin, boiled, drained
Pumpkin leaves, boiled, drained

答案 1 :(得分:0)

使用regexp_replace替换最后一个逗号之后的内容,包括最后一个逗号:

select regexp_replace(
  'Mollusks, oyster, eastern (blue point), wild, raw',
  ',[^,]*$', ''
);
select regexp_replace(
  'Mollusks, oyster, eastern (blue point), wild, boiled or steamed',
  ',[^,]*$', ''
);

两者的输出:

+----------------------------------------------+
| regexp_replace                               |
|----------------------------------------------|
| Mollusks, oyster, eastern (blue point), wild |
+----------------------------------------------+