使用JPA和Hibernate时,DISTINCT如何工作

时间:2009-08-28 10:31:56

标签: java jpa distinct

DISTINCT在JPA中使用哪一列,是否可以更改它?

以下是使用DISTINCT的示例JPA查询:

select DISTINCT c from Customer c

这没有多大意义 - 什么专栏是基于什么?它是否在Entity上指定为注释,因为我找不到它?

我想指定用于区分的列,例如:

select DISTINCT(c.name) c from Customer c

我正在使用MySQL和Hibernate。

6 个答案:

答案 0 :(得分:55)

你很亲密。

select DISTINCT(c.name) from Customer c

答案 1 :(得分:13)

@Entity
@NamedQuery(name = "Customer.listUniqueNames", 
            query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
        ...

        private String name;

        public static List<String> listUniqueNames() {
             return = getEntityManager().createNamedQuery(
                   "Customer.listUniqueNames", String.class)
                   .getResultList();
        }
}

答案 2 :(得分:10)

更新:请参见最高投票的答案。

我自己目前已经过时了。只是出于历史原因留在这里。


在连接中通常需要HQL中的区别,而不是像您自己的简单示例中那样。

另见How do you create a Distinct query in HQL

答案 3 :(得分:10)

我同意 kazanaki 的答案,这对我很有帮助。 我想选择整个实体,所以我用了

 select DISTINCT(c) from Customer c

在我的情况下,我有多对多的关系,我想在一个查询中加载带有集合的实体。

我使用LEFT JOIN FETCH,最后我必须使结果不同。

答案 4 :(得分:7)

正如我在this article中所述,根据底层JPQL或Criteria API查询类型,DISTINCT在JPA中具有两种含义。

标量查询

对于返回标量投影的标量查询,例如以下查询:

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

DISTINCT关键字应该传递给基础SQL语句,因为我们希望数据库引擎在返回结果集之前过滤重复项:

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

实体查询

对于实体查询,DISTINCT具有不同的含义。

在不使用DISTINCT的情况下,查询如下:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

将要像这样联接postpost_comment表:

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

但是对于每个关联的post行,其父post_comment记录在结果集中是重复的。因此,List个实体中的Post将包含重复的Post实体引用。

要消除Post实体引用,我们需要使用DISTINCT

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

但是随后DISTINCT也被传递给SQL查询,这是完全不希望的:

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

通过将DISTINCT传递到SQL查询,EXECUTION PLAN将执行额外的 Sort 阶段,该阶段会增加开销而不会带来任何价值,因为父子组合始终返回唯一由于子PK列而记录:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

具有HINT_PASS_DISTINCT_THROUGH的实体查询

要从执行计划中消除“排序”阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH JPA查询提示:

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

现在,SQL查询将不包含DISTINCT,但是Post实体引用重复项将被删除:

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1]

执行计划将确认我们这次不再有额外的排序阶段:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms

答案 5 :(得分:2)

我会使用JPA的构造函数表达式功能。另见以下答案:

JPQL Constructor Expression - org.hibernate.hql.ast.QuerySyntaxException:Table is not mapped

按照问题中的例子,它会是这样的。

SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c