Question

我经常发现自己运行查询以获得符合特定条件的人数，该人口中的总人数以及找到符合该条件的百分比。我一直在以同样的方式做这件事，我想知道如何解决同样类型的问题。以下是我编写查询的方式：

select m.state_cd
    ,m.injurylevel
    ,COUNT(distinct m.patid) as pplOnRx
    ,x.totalPatientsPerState
    ,round((COUNT(distinct m.patid) /cast(x.totalPatientsPerState as float))*100,2) as percentPrescribedNarcotics
    from members as m
    inner join rx on rx.patid=m.PATID
    inner join DrugTable as dt on dt.drugClass=rx.drugClass
    inner join 
    (
        select m2.state_cd, m2.injurylevel, COUNT(distinct m2.patid) as totalPatientsPerState
            from members as m2
            inner join rx on rx.patid=m2.PATID
            group by m2.STATE_CD,m2.injuryLevel
    ) x on x.state_cd=m.state_cd and m.injuryLevel=x.injurylevel
    where drugText like '%narcotics%'
    group by m.state_cd,m.injurylevel,x.totalPatientsPerState
    order by m.STATE_CD,m.injuryLevel

在此示例中，并非members表中显示的所有人都在rx表中。派生的表格可确保rx中的所有人都在members中且drugText like narcotics不具备条件。从我玩过的一点点开始，似乎over(partition by子句可能在这里起作用。我不知道是不是，对我而言似乎是这样。别人怎么去处理这个问题？

结果：

enter image description here

Answer 1

这正是MDX和SSAS的目的。如果你坚持在SQL中做这件事（没有错），你是否想要一种更好的性能呢？在这种情况下，它将取决于表的索引方式，tempdb速度，以及表是否已分区，那么也是如此。

此外，明显的计数将是更大的性能命中之一。谓词中的like '%narcotics%'将强制进行全表扫描，应该不惜一切代价避免（这可能是数据模型中的整数键吗？）

要回答你的问题，不确定窗口（over partition by）会有更好的表现。我会测试并查看，但查询没有“错误”。

您可以使用group by或这两者的组合将count distinct重写为虚拟表或临时表。

为了说明，这是一个窗口存根，您可以将其扩展为相同的查询：

select a.state_cd,a.injurylevel,a.totalpatid, count(*) over (partition by a.state_cd, a.injurylevel)
from
(select state_cd,injurylevel,count(*) as totalpatid, count(distinct patid) as patid 
from
#members
group by state_cd,injurylevel
)  a

看看我的意思是不是真的有帮助吗？然后，有时候稍微重写一个查询可以通过选择一个更好的执行计划来提高性能，而不是在黑暗中采取刺，我首先找到你所拥有的查询中的瓶颈，因为你已经花时间写了它。

派生表用于汇总统计

1 个答案: