Question

我正在建立一种字典应用程序，我有一个用于存储以下字词的表格：

id | surface_form | examples
-----------------------------------------------------------------------
 1 | sounds       | {"It sounds as though you really do believe that",
   |              |  "A different bell begins to sound midnight"}

其中surface_form的类型为CHARACTER VARYING，而examples的数组字段为CHARACTER VARYING

由于示例是从另一个API自动生成的，因此它可能不包含确切的＆＃34; surface_form＆＃34;。现在我想在示例中仅保留包含确切surface_form的句子。例如，在给定的示例中，只保留第一个句子，因为它包含sounds，第二个句子应该省略，因为它只包含sound。

问题是我陷入了如何编写查询和/或plSQL存储过程以更新examples列以便它只有所需的句子。

Answer 1

此查询会跳过不需要的数组元素：

select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id;

 id |                    new_examples                    
----+----------------------------------------------------
  1 | {"It sounds as though you really do believe that"}
(1 row)

在update：

中使用它

with corrected as (
    select id, array_agg(example) new_examples
    from a_table, unnest(examples) example
    where surface_form = any(string_to_array(example, ' '))
    group by id
)
update a_table
set examples = new_examples
from corrected
where examples <> new_examples
and a_table.id = corrected.id;

在rextester中测试。

Answer 2

也许您必须更改表格设计。这就是PostgreSQL的文档中关于数组使用的说法：

数组不是集合;搜索特定的数组元素可能是数据库错误设计的标志。考虑为每个将成为数组元素的项使用一个单独的表。这将更容易搜索，并且可能更好地扩展到大量元素。

文档： https://www.postgresql.org/docs/current/static/arrays.html

Answer 3

最紧凑的解决方案（但不一定是最快的）是编写一个函数，您传递一个正则表达式和一个数组，然后返回一个只包含与正则表达式匹配的项的新数组。

create function get_matching(p_values text[], p_pattern text)
  returns text[]
as
$$
declare
  l_result text[] := '{}'; -- make sure it's not null
  l_element text;
begin
  foreach l_element in array p_values loop

    -- adjust this condition to whatever you want
    if l_element ~ p_pattern then
      l_result := l_result || l_element;
    end if;

  end loop;
  return l_result;
end;
$$
language plpgsql;

if条件只是一个例子。您需要将其调整为surface_form列中存储的任何内容。也许你需要测试正则表达式的单词边界或简单的instr()会做什么 - 你的问题不清楚。

清理桌子然后变得如此简单：

update the_table
   set examples = get_matching(examples, surface_form);

但整个方法对我来说似乎有缺陷。如果将示例存储在正确规范化的数据模型中，效率会更高。

Answer 4

在SQL中，你必须记住两件事。

元组元素是不可变的，但行可以通过更新进行更改。
SQL是声明性的，而不是程序性的

所以你不能有条件地＆＃34; ＆＃34;删除＆＃34;数组中的值。你必须以不同的方式思考这个问题。您必须按照规范创建新数组。该规范可以有条件地包含值（使用case语句）。然后你可以用新数组覆盖元组。

Answer 5

看起来有一种方法可以通过使用like或者某些正则表达式进行选择来使用有效的数组元素更新数组。

https://www.postgresql.org/docs/current/static/arrays.html

Answer 6

您可以使用选择临时表中的数据然后使用行号更新查询更新临时表使用合并值您可以在原始表中更新此合并值

例如

假设您创建临时表 Temp（id int，元素字符变化）然后更新Temp表并嵌套它。最后更新原始表

以下是您可以直接尝试在编辑器中执行的查询

CREATE TEMP TABLE IF NOT EXISTS temp_element (
    id bigint,
    element character varying)WITH (OIDS);
TRUNCATE TABLE temp_element;
insert into temp_element select row_number() over (order by p),p from (
select unnest(ARRAY['It sounds as though you really do believe that',  
'A different bell begins to sound midnight']) as P)t;
update temp_element set element = 'It sounds as though you really' 
where element = 'It sounds as though you really do believe that';
--update table
select array_agg(r) from ( select element from temp_element)r

Answer 7

如果要保存数组中包含“surface_form”的元素，则必须使用带子字符串（....，...）的条目不为空

首先你不需要数组，只保留匹配的项目，然后array_agg存储的项目

这是一个小问题，您可以在没有任何表格的情况下进行测试。

SELECT
  id,
  surface_form,
  (SELECT array_agg(examples_matching)
   FROM unnest(surfaces.examples) AS examples_matching
   WHERE substring(examples_matching, surfaces.surface_form) IS NOT NULL)
FROM
  (SELECT
     1                                              AS id,
     'example' :: TEXT                              AS surface_form,
     ARRAY ['example form', 'test test','second example form'] :: TEXT [] AS examples
  ) surfaces;

有条件地删除Array Field PostgreSQL中的项目

7 个答案: