我有这些条目:
id | fooddescription
--------------------
1 | 'Mollusks, oyster, eastern (blue point), wild, raw'
2 | 'Mollusks, oyster, eastern (blue point), wild, boiled or steamed'
3 | 'Vegetable oil, olive'
4 | 'Vegetable oil, almond'
5 | 'Pumpkin, boiled, drained, with salt'
6 | 'Pumpkin leaves, boiled, drained, with salt'
我想将前两个条目视为一个,因为它们只有不同的准备方法,而其他的却有所不同。字符串中的单词按从一般到特定的顺序排列,最后一部分(当有很多描述和,
时)通常是不需要区别的准备方法。
所需结果:
id | fooddescription
--------------------
1 | 'Mollusks, oyster, eastern (blue point), wild, '
3 | 'Vegetable oil, olive'
4 | 'Vegetable oil, almond'
5 | 'Pumpkin, boiled, drained, '
6 | 'Pumpkin leaves, boiled, drained, '
首先,我想我可以修剪字符串以除去最后一个逗号之后的部分。因此,根据此MySQL answer,我编写了一个postgres脚本:
SELECT reverse(
substring(reverse(fooddescription),
position(',' in reverse(fooddescription)))) as trimmed, count(*)
FROM food_name
GROUP BY trimmed HAVING COUNT(*)>0
我将得到以下结果:
'Mollusks, oyster, eastern (blue point), wild,'
'Vegetable oil,'
'Pumpkin, boiled, drained,'
'Pumpkin leaves, boiled, drained,'
“蔬菜油”是不可取的,我不能保留id
。
所以我的问题是:
,
的数量,并且如果有多个定界符,如何仅修剪最后一部分?id
之后为每个组保留一个GROUP BY
吗?答案 0 :(得分:2)
代替基于位置的子字符串,您可以将文本拆分为数组并计算元素数。
这是一个完整的示例:
WITH food_name (fooddescription) AS (
VALUES
('Mollusks, oyster, eastern (blue point), wild, raw'),
('Mollusks, oyster, eastern (blue point), wild, boiled or steamed'),
('Vegetable oil, olive'),
('Vegetable oil, almond'),
('Pumpkin, boiled, drained, with salt'),
('Pumpkin leaves, boiled, drained, with salt')
)
SELECT ARRAY_TO_STRING(trimmed.trimmed, ', ')
FROM food_name
, LATERAL (SELECT STRING_TO_ARRAY(fooddescription, ', ') parts) parts
, LATERAL (SELECT CASE WHEN array_length(parts, 1) <= 2 THEN parts ELSE parts[1:array_length(parts, 1)-1] END trimmed) trimmed
这将返回以下结果集:
trimmed
Mollusks, oyster, eastern (blue point), wild
Mollusks, oyster, eastern (blue point), wild
Vegetable oil, olive
Vegetable oil, almond
Pumpkin, boiled, drained
Pumpkin leaves, boiled, drained
答案 1 :(得分:0)
使用regexp_replace
替换最后一个逗号之后的内容,包括最后一个逗号:
select regexp_replace(
'Mollusks, oyster, eastern (blue point), wild, raw',
',[^,]*$', ''
);
select regexp_replace(
'Mollusks, oyster, eastern (blue point), wild, boiled or steamed',
',[^,]*$', ''
);
两者的输出:
+----------------------------------------------+
| regexp_replace |
|----------------------------------------------|
| Mollusks, oyster, eastern (blue point), wild |
+----------------------------------------------+