Postgres LEFT JOIN创建的行多于左表

时间:2012-03-18 20:23:57

标签: postgresql left-join sql-insert

我在Windows 7 x64上运行Postgres 9.1.3 32位。 (必须使用32位,因为没有Windows PostGIS版本与64位Postgres兼容。)(编辑:从PostGIS 2.0开始,它与Windows上的Postgres 64位兼容。)

我有一个查询将表(consistent.master)与临时表连接,然后将结果数据插入第三个表(consistent.masternew)。

由于这是left join,因此结果表应与查询中的左表具有相同的行数。但是,如果我运行这个:

SELECT count(*)
FROM consistent.master

我得到2085343。但如果我这样做:

SELECT count(*)
FROM consistent.masternew

我得到2085703

masternew如何拥有比master更多的行? masternew不应该与查询中的左表master具有相同的行数吗?

以下是查询。 mastermasternew表应该是相同结构的。

--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
  citation_id,
  defendant_dl,
  defendant_dl_state,
  defendant_zip,
  defendant_race,
  defendant_sex,
  defendant_dob,
  vehicle_licenseplate,
  vehicle_licenseplate_state,
  vehicle_registration_expiration_date,
  vehicle_year,
  vehicle_make,
  vehicle_model,
  vehicle_color,
  offense_timestamp,
  offense_street_number,
  offense_street_name,
  offense_crossstreet_number,
  offense_crossstreet_name,
  offense_county,
  officer_id,
  offense_code,
  speed_alleged,
  speed_limit,
  work_zone,
  school_zone,
  offense_location,
  source,
  citing_jurisdiction,
  the_geom)

--Here's the select query that the insert statement is using.    
SELECT stops.stop,
  master.citation_id,
  defendant_dl,
  defendant_dl_state,
  defendant_zip,
  defendant_race,
  defendant_sex,
  defendant_dob,
  vehicle_licenseplate,
  vehicle_licenseplate_state,
  vehicle_registration_expiration_date,
  vehicle_year,
  vehicle_make,
  vehicle_model,
  vehicle_color,
  offense_timestamp,
  offense_street_number,
  offense_street_name,
  offense_crossstreet_number,
  offense_crossstreet_name,
  offense_county,
  officer_id,
  offense_code,
  speed_alleged,
  speed_limit,
  work_zone,
  school_zone,
  offense_location,
  source,
  citing_jurisdiction,
  the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id

如果重要,我运行VACUUM FULL ANALYZE并重新编制两个表索引。 (不确定命令;通过pgAdmin III完成。)

2 个答案:

答案 0 :(得分:9)

左连接的行数不一定与左表中的行数相同。基本上,它就像一个普通的连接,除了左表的行也不会出现在普通连接中。因此,如果右表中有多行与左表中的一行匹配,则结果中的行可以比左表的行数多。

为了做你想做的事,你应该使用group by和count来检测倍数。

select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1

答案 1 :(得分:0)

有时候您知道有很多倍,但不在乎。您只想参加第一项或第一项。
如果是这样,您可以使用SELECT DISTINCT ON

FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id

citation_id是您要在每个匹配项的第一行(任意行)使用的列。

您可能要确保这是确定性的,并将ORDER BY与其他一些可排序的列一起使用:

SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at