针对硬核SQL设计者的复杂选择查询问题

时间:2010-06-14 15:22:04

标签: sql sql-server sql-server-2005

非常复杂的查询试图将其构建几天并取得更大的成功。

我正在使用SQL-SERVER 2005 Standard

我需要的是: 来自广告系列的5个CampaignVariants,而2个具有最大的PPU编号集,3个是随机的。

下一个条件是CampaignDailyBudget和CampaignTotalBudget低于Campaign中设置的值(计算是用户点击后通过CampaignVariants连接到广告系列的访问者表格中的点击次数)

下一个条件 CampaignLanguage,CampaignCategory,CampaignRegion和CampaignCountry必须是我使用(languageID,categoryID,regionID和countryID)发送到此选择的那些。

下一个条件是我发送到此select语句的IP地址将不在当前Campaign的IP列表中(我删除非活动24小时IP)。

换句话说,当我从用户PublisherRegionUID,IP,语言,国家和地区获取时,它为进入网站的用户获得5个CampaignVariants

view diagram

更多详情

我从Visitor中获取countryID,regionID,ipID,PublisherRegionUID和languageID。这是过滤器参数。虽然我首先需要通过它的类别,语言等来获取Publisher在其网站上显示的内容......然后我使用除了PublisherRegionUID之外的所有参数按访问者的参数过滤所有剩余的广告系列。

所以它有两个实际的fiters。发布者希望发布的内容以及访问者可以查看的其他内容...

campaignDailyBudget和campaignTotalBudget是由创建广告系列的用户设置的值。这两个比较(每个广告系列的点击次数)*(campaignPPU),而日期过滤器显然用于过滤今天上午12:00到晚上11:59的campaignDailyBudget。由于显而易见的原因,campaignTotalBudget未按日期过滤

存储过程演示

ALTER PROCEDURE dbo.CampaignsGetCampaignVariants4Visitor
    @publisherSiteRegionUID uniqueidentifier,
    @visitorIP varchar(15),
    @browserID tinyint,
    @countryID tinyint,
    @osID tinyint,
    @languageID tinyint,
    @acceptsCookies bit
AS
BEGIN
    SET NOCOUNT ON;

    -- check if such @publisherRegionUID exists
    if exists(select publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=@publisherSiteRegionUID)
        begin

            declare @publisherSiteRegionID int
            select @publisherSiteRegionID = publisherSiteRegionID from PublisherSiteRegions where publisherSiteRegionUID=@publisherSiteRegionUID

            -- get CampaignVariants 
            -- ** choose 2 highest PPU and 3 random CampaignVariants from Campaigns list 
            -- where regionID,countryID,categoryID,languageID meets Publisher and Visitor requirements
            -- and Campaign.campaignDailyBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU during this day
            -- and Campaign.campaignTotalBudget<(sum of Clicks in Visitors per this Campaign)*Campaign.PPU
            -- and @visitorID does not appear in Campaigns2IPs with this Campaign

            -- insert visitor
            insert into Visitors (ipAddress,browserID,countryID,languageID,OSID,acceptsCookies)
            values (@visitorIP,@browserID,@countryID,@languageID,@OSID,@acceptsCookies)

            declare @visitorID int
            select @visitorID = IDENT_CURRENT('Visitors')

            -- add IP to pool Campaigns ** adding ip to all Campaigns whose CampaignVariants were chosen

            -- add PublisherRegion2Visitor relationship
            insert into PublisherSiteRegions2Visitors values (@visitorID,@publisherSiteRegionID)

            -- add CampaignVariant2Visitor relationship


        end



END
GO

2 个答案:

答案 0 :(得分:3)

我还对你的倾斜要求做了一些假设。随着我的进展,我会拼出来,并解释代码。请注意,我当然没有合理的方法来测试此代码是否存在拼写错误或次要逻辑错误。

有可能将此编写为单个巨大的查询,但这会很尴尬,丑陋,并且容易出现性能问题,因为SQL优化器可能会出现过大查询的计划问题。一个选项是将其写为一系列查询,填充临时表以供后续查询使用(这可以简化调试)。我选择将它写成一个包含一系列CTE表的大型公用表表达式语句,主要是因为它更好地“流动”,并且它可能比多临时表版本表现更好。

第一个假设:那里有几个ciruclar引用。 Campaign具有指向国家和地区的链接,因此必须检查这两个参数值 - 即使基于国家/地区的表链接,此过滤器可能简化为仅检查国家/地区(假设国家/地区参数值)总是“在”区域参数中)。这同样适用于语言和类别,也许适用于IP和访客。这似乎是草率的设计;如果可以清除,或者可以对数据的有效性做出假设,则可以简化查询。

第二个假设:参数以@Region,@ Country等形式作为变量传入。此外,只传入一个IP地址;如果没有,那么你需要传入多个值,设置一个包含这些值的临时表,并将其添加为我使用@IP参数的过滤器。

因此,第1步是第一次确定“符合条件”的广告系列,通过提取所有分享所需国家/地区,区域,语言,类别以及拥有一个IP地址的广告系列与他们相关:

WITH cteEligibleCampaigns (CampaignId)
 as (select CampaignId
      from Campaigns2Regions
      where RegionId = @RegionId
     intersect select CampaignId
      from Campaign2Countries
      where CountryId = @CountryId
     intersect select CampaignId
      from Campaign2Languages
      where LanguageId = @LanguageId
     intersect select CampaignId
      from Campaign2Categories
      where CategoryId = @CategoryId
     except select CampaignId
      from Campaigns2IPs
      where IPID = @IPId)

接下来,从这些项目中筛选出“CampaignDailyBudget和CampaignTotalBudget低于Campaign中设置的项目(计算是通过CampaignVariants与用户点击的广告系列相关联的访问者表格中的点击次数)”。这个要求对我来说并不完全清楚。我选择将其解释为“仅包含那些广告系列,如果您计算这些广告系列的CampaignVariants的访问者数量,则总计数小于CampaignDailyBudget和CampaignTotalBudget”。请注意,这里我引入了一个随机值,稍后用于选择随机行。

,cteTargetCampaigns (CampaignId, RandomNumber)
  as (select CampaignId, checksum(newid() RandomNumber)
       from cteEligibleCampaigns ec
        inner join Campaigns ca
         on ca.CampgainId = ec.CampaignId
        inner join CampaignVariants cv
         on cv.CampgainId = ec.CampaignId
        inner join CampaignVariants2Visitors cvv
         on cvv.CampaignVariantId = cv. CampaignVariantId
       group by ec.CampaignId
       having count(*) < ca.CampaignDailyBudget
        and count(*) < CampaignTotalBudget)

接下来,确定两个“最佳”项目。

,cteTopTwo (CampaignId, Ranking)
  as (select CampaignId, row_number() over (order by CampgainPPU desc)
       from cteTargetCampaigns tc
        inner join Campaigns ca
         on ca.CampaignId = tc.CampaignId)

接下来,按随机分配的号码排列所有其他广告系列:

,cteRandom (CampaignId, Ranking)
  as (select CampaignId, row_number() over (order by RandomNumber)
       from cteTargetCampaigns
       where CampaignId not in (select CampaignId
                                 from cteTopTwo
                                 where Ranking < 3))

最后,将数据集拉到一起:

 select CampaignId
  from cteTopTwo
  where Ranking <= 2
 union all select CampaignId
  from cteRandom
  where Ranking <= 3

将上面的代码部分拼凑在一起,调试拼写错误,无效假设和错过的要求(例如从随机代码中识别前两项的订单或标记),你应该是好的。

答案 1 :(得分:2)

我不确定我理解你帖子的这一部分:

  

它为用户获得5个CampaignVariants   当我从中进入网站时   用户   PublisherRegionUID,IP,语言,国家   和地区

我假设“它”是查询。给你第二个“下一个条件”的用户是IP? “当我从用户那里拿”时是什么意思?这是否意味着这是您执行查询时所拥有的信息,还是您从查询中返回的信息?如果是后者,那么有许多问题需要回答,因为许多这些专栏都是许多:许多关系的一部分。

无论如何,下面是获取5个广告系列的方法,根据您的第二个“下一个条件”,您有一个要过滤掉的IP地址。我还假设您想要总共五个广告系列,这意味着三个随机广告系列不能包含两个“最高PPU”广告系列。

With 
    ValidCampaigns As
    (
    Select C.campaignId
    From Campaigns As C
        Left Join (Campaigns2IPs As CIP
            Join IPs
                On IPs.ipID = CIP.ipID
                    And IPs.ipAddress = @IPAddress)
            On CIP.campaignId = C.campaignId
    Where CIP.campaignID Is Null
    )
    CampaignPPURanks As
    (
    Select C.campaignId
        , Row_Number() Over ( Order By C.campaignPPU desc ) As ItemRank
    From ValidCampaigns As C
    )
    , RandomRanks As
    (
    Select campaignId
        , Row_Number() Over ( Order By newid() desc ) As ItemRank
    From ValidCampaigns As C
        Left Join CampaignPPURanks As CR
            On CR.campaignId = C.campaignId
                And CR.ItemRank <= 2
    Where CR.campaignId Is Null
    )
Select ...
From CampaignPPURanks As CPR
    Join CampaignVariants As CV
        On CV.campaignId = CPR.campaignId
            And CPR.ItemRank <= 2 
Union All           
Select ...
From RandomRanks As RR
    Join CampaignVariants As CV
        On CV.campaignId = RR.campaignId
            And RR.ItemRank <= 3