How to make my query using a self-join faster?

时间:2018-09-18 20:25:03

标签: sql r postgresql self-join

I had originally tried creating two different queries and then merging them in R to get a cumulative time graph but I am trying to just get the information I want in a single query.

Original code:

users <- dbGetQuery(pool, "select id, name
                    from schema.table
                    where (name like '%t%' and name like '%2018%') or
                    (name like '%t%' and name like '%2017%')")
opts <- dbGetQuery(pool, "select id, name, ts
                    from schema.table
                    where name = 'qr_optin'")

all <- merge(users, opts, by = "id")

all <- all %>% 
  mutate(date =  as.Date(all$ts),
         name.x = gsub("t", "", name.x)) %>% 
  group_by(name.x, date) %>% 
  summarise(n = n()) 

Which outputs something like this:

name          date         n 
x          2018-09-09      12
x          2018-09-08      5
y          2018-09-08      4
xy         2018-09-06      8
xy         2018-09-04      9

I'm trying to get the information with at least the two queries joined but I've only ever made it this far and it's insanely slow.

select f1.id, f1.name, f2.ts
from schema.table f1
left join schema.table f2 on f2.id = f1.id
where f2.name = ' qr_optin' and
(f1.name like '%t%' and f1.name like '%2018%') or
(f1.name like '%t%' and f1.name like '%2017%')

1 个答案:

答案 0 :(得分:2)

只需在Postgres中运行纯SQL即可进行合并(即联接)或汇总(即聚集聚合)

加入级别查询

with cte as 
  ( 
    select usrs.id, Replace(usrs.name, "t", "") as usr_name, opts.ts
    from schema.table as usrs
    inner join rvv.fbm as opts 
            on opts.id = usrs.id and opts.name = 'qr_optin'
    where (name like '%t%' and name like '%2018%') or
          (name like '%t%' and name like '%2017%')
  )

select cte.usr_name as name, cte.ts as date, count(*) as n
from cte
group by cte.name, cte.ts

汇总查询(带有CTE)

DBI::dbGetQuery

将R中的任一查询传递给all <- dbGetQuery(pool, "...myquery...") 调用。

str = str.trimRight();