我有一个表(名为Data_detailed
),如下所示:
sample_year| Cell_ID | Species_ID | a | b | c | d...
2017 | 103.60 | PLALAG | Adult | | Adult |
2017 | 103.60 | PLALAG | | Adult | Adult |
2017 | 103.60 | TRIMON | Adult | Adult | Adult | Seedling
2017 | 103.70 | ANTNST | | Adult | Adult |
2017 | 103.70 | AVESTE | | Adult | Adult |
2017 | 103.70 | AVESTE | Adult | Seedling | | Seedling
2017 | 103.70 | BROSCO | Adult | Adult | |
它有三个标识字段,(sample_year
,Cell_ID
和Species_ID
),然后是一些可以为空或包含两个值之一的列:" Seedling& #34;和"成人"。
正如您所看到的,我的识别字段的某些组合重复了不止一次(例如" AVESTE"在单元格103.7中),我想使用两个简单的规则将它们组合到单个记录中: / p>
a
,b
等。),如果有值 - 请接受它。所以我创建了一个查询来查找与我的识别字段(名为Data_detailed_duplicates
)相关的所有重复记录:
SELECT Data_detailed.sample_year, Data_detailed.Cell_ID, Data_detailed.Species_ID,
Count(Data_detailed.sample_year) AS CountOfsample_year
FROM Data_detailed
GROUP BY Data_detailed.sample_year, Data_detailed.Cell_ID, Data_detailed.Species_ID
HAVING (((Data_detailed.sample_year)=get_year())
AND ((Data_detailed.Species_ID)<>"GENSPP"
And (Data_detailed.Species_ID)<>"MEDSPP")
AND ((Count(Data_detailed.sample_year))>1));
然后我创建了一个根据上述规则合并这些记录的查询(我使用Max
进行分组,因为&#34;幼苗&#34;编码为0,&#34;成人&#34; as - 1):
SELECT Data_detailed.sample_year, Data_detailed.Cell_ID, Data_detailed.Species_ID,
Max(Data_detailed.a) AS MaxOfa,
Max(Data_detailed.b) AS MaxOfb,
Max(Data_detailed.c) AS MaxOfc,
Max(Data_detailed.d) AS MaxOfd,
Max(Data_detailed.e) AS MaxOfe,
Max(Data_detailed.f) AS MaxOff,
Max(Data_detailed.g) AS MaxOfg,
Max(Data_detailed.h) AS MaxOfh,
Max(Data_detailed.InnerQ) AS MaxOfInnerQ
FROM Data_detailed INNER JOIN Data_detailed_duplicates
ON (Data_detailed.sample_year = Data_detailed_duplicates.sample_year)
AND (Data_detailed.Species_ID = Data_detailed_duplicates.Species_ID)
AND (Data_detailed.Cell_ID = Data_detailed_duplicates.Cell_ID)
GROUP BY Data_detailed.sample_year, Data_detailed.Cell_ID, Data_detailed.Species_ID
HAVING (((Data_detailed.Species_ID)<>"GENSPP" And (Data_detailed.Species_ID)<>"MEDSPP"));
到目前为止,一切运作良好。
但是,我不希望在查询结果中提取所有合并的记录,而是希望它们在表中实际更新,因此合并的每两个或更多记录将只产生一条记录,所有信息和所有其他记录将从表中删除。我该怎么做?
上述示例的结果将是:
sample_year| Cell_ID | Species_ID | a | b | c | d...
2017 | 103.60 | PLALAG | Adult | Adult | Adult |
2017 | 103.60 | TRIMON | Adult | Adult | Adult | Seedling
2017 | 103.70 | ANTNST | | Adult | Adult |
2017 | 103.70 | AVESTE | Adult | Seedling | Adult | Seedling
2017 | 103.70 | BROSCO | Adult | Adult | |
答案 0 :(得分:1)
TL; DR :我在表格上使用一个新字段来标记所有重复项,在合并到表格后附加它们,并删除标记的记录。
以下是我最终解决问题的方法:
我使用第一个查询(Data_detailed_duplicates
)来创建要合并的所有记录的列表,并使用第二个查询(Data_detailed_merged_duplicates
)来创建应该替换重复项的记录列表。表。所有这些就像问题中提到的那样。
接下来,我在表格中创建一个新字段(Duplicates
),并使用以下更新查询来标记所有重复记录:
UPDATE DISTINCTROW Data_detailed_duplicates
INNER JOIN Data_detailed ON (Data_detailed_duplicates.sample_year = Data_detailed.sample_year)
AND (Data_detailed_duplicates.Cell_ID = Data_detailed.Cell_ID)
AND (Data_detailed_duplicates.Species_ID = Data_detailed.Species_ID)
SET Data_detailed.Duplicates = 1
WHERE (((Data_detailed.Duplicates)=False));
现在我使用另一个查询将所有合并的记录追加到表中:
INSERT INTO Data_detailed ( sample_year, Cell_ID, Species_ID, a, b, c, d, e, f, g, h, InnerQ, Duplicates )
SELECT Data_detailed_merged_duplicates.sample_year,
Data_detailed_merged_duplicates.Cell_ID,
Data_detailed_merged_duplicates.Species_ID,
Data_detailed_merged_duplicates.MaxOfa,
Data_detailed_merged_duplicates.MaxOfb,
Data_detailed_merged_duplicates.MaxOfc,
Data_detailed_merged_duplicates.MaxOfd,
Data_detailed_merged_duplicates.MaxOfe,
Data_detailed_merged_duplicates.MaxOff,
Data_detailed_merged_duplicates.MaxOfg,
Data_detailed_merged_duplicates.MaxOfh,
Data_detailed_merged_duplicates.MaxOfInnerQ,
0 AS Expr1
FROM Data_detailed_merged_duplicates;
最后,我删除了之前用另一个查询标记的所有重复记录(现在它们至少是三元组,但只有一条没有标记):
DELETE Data_detailed.*, Data_detailed.Duplicates
FROM Data_detailed
WHERE (((Data_detailed.Duplicates)=True));
所以我得到了所有要合并的记录,而没有创建临时表。
所有这些过程都封装在一个宏中,所以我不必查找所有这些查询并逐个应用它们: