Question

所以我有下表的架构：

import os
import glob

collection = "/home/dataset"

generic_pattern = os.path.join(
    collection,
    "*", "images", "*.png"
)

for a_file in glob.glob(generic_pattern):
    f_path, f_name = os.path.split(a_file)
    f_raw_name, f_ext = f_name.split('.')
    # not sure what you want to do for the new file name
    new_name = f_path.split(os.sep)[-2]  # would be 'img995' in the example
    f_new = os.path.join(
        f_path, 
        '{0}.{1}'.format(new_name, f_ext)
    )
    # to see what you will be doing:
    print(a_file, f_new)
    # if everything works fine, uncomment this:
    # os.rename(a_file, f_new)

具有以下测试数据：

CREATE TABLE stages (
  id  serial PRIMARY KEY,
  cid VARCHAR(6)  NOT NULL,
  stage varchar(30)  NOT null,
  status varchar(30) not null,
);

现在，用例是我们要在每个阶段查询该表，例如，我们将在该表中查询“第一阶段” ，然后尝试获取所有这些 cids < / strong>，在随后的阶段中不存在，例如“第二阶段” ：

结果集：

INSERT INTO stages (id, cid, stage, status) VALUES ('1', '1', 'first stage', 'accepted'), ('2', '1', 'second stage', 'current'), ('3', '2', 'first stage', 'accepted'), ('4', '3', 'first stage', 'accepted'), ('5', '3', 'second stage', 'accepted'), ('6', '3', 'third stage', 'current') ;

在运行“第二阶段” 的查询时，我们将尝试获取所有在“第三阶段”中不存在的 cids / strong>，依此类推。

结果集：

cid | status 2 | 'accepted'

当前，我们通过在性能不是很好的where子句中创建一个存在子查询来实现此目的。

问题是，对于我们当前正在使用的方法，是否有更好的替代方法？还是仅需要专注于优化当前方法？另外，我们可以做哪些进一步的优化来使现有子查询的性能更高？

谢谢！

Answer 1

您可以使用lead()：

select s.*
from (select s.*,
             lead(stage) over (partition by cid order by id) as next_stage
      from stages s
     ) s
where stage = 'first stage' and next_stage is null;

Answer 2

CREATE TABLE stages (
  id  serial PRIMARY KEY
  , cid VARCHAR(6)  NOT NULL
  , stage varchar(30)  NOT null
  , status varchar(30) not null
   , UNIQUE ( cid, stage)
);


INSERT INTO stages (id, cid, stage, status) VALUES
  (1, '1', 'first stage', 'accepted'),
  (2, '1', 'second stage', 'current'),
  (3, '2', 'first stage', 'accepted'),
  (4, '3', 'first stage', 'accepted'),
  (5, '3', 'second stage', 'accepted'),
  (6, '3', 'third stage', 'current')
  ;
ANALYZE stages;

        -- You can fetch all (three) stages with one query
        -- Luckily, {'first', 'second', 'third'} are ordered alphabetically ;-)
        -- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT * FROM stages q
WHERE NOT EXISTS (
        SELECT * FROM stages x
        WHERE x.cid = q.cid AND x.stage > q.stage
        );

        -- Some people dont like EXISTS, or think that it is slow.
        -- --------------------------------------------------------------
-- EXPLAIN ANALYZE
SELECT q.*
FROM stages q
JOIN (
        SELECT id
        , row_number() OVER (PARTITION BY cid ORDER BY stage DESC) AS rn
         FROM stages x
        )x ON x.id = q.id AND x.rn = 1;

PostgreSQL：替代子查询以提高查询效率？

2 个答案: