如何提高此查询的性能

时间:2018-08-19 17:57:05

标签: sql sql-server query-performance

我一直在努力完成自己的数据任务,为此我很努力,因为我精通这一点,所以我尝试使用c#进行操作,但是它花了很多时间!!!因此,我决定在SQL Server中进行操作,但是看起来它仍然需要花费很长的时间,可能要花20天才能完成。

有人知道任何更有效的方式来编写我的存储过程吗?

ALTER PROCEDURE [dbo].[TotalWins]
AS
BEGIN
    SET NOCOUNT ON

    DECLARE @meeting_date DATE;
    DECLARE @idStore INT;
    DECLARE @race_idStore INT;
    DECLARE @runner_id INT;

    SET @idStore = 0;
    SET @race_idStore = -1;
    SET @runner_id = 0;

    WHILE(@idStore IS NOT NULL)  
    BEGIN
        SET @race_idStore = -1;
        SET @runner_id = 0;

        SELECT 
            @idStore = MIN(runners.id)
        FROM
            dbHorseRacing.dbo.historic_runners AS runners
        WHERE
            runners.id > @idStore;

        IF @idStore IS NOT NULL  
        BEGIN   
            SELECT 
                @runner_id = runners.runner_id, @meeting_date = races.meeting_date
            FROM
                dbHorseRacing.dbo.historic_runners AS runners
            INNER JOIN 
                dbHorseRacing.dbo.historic_races AS races ON races.race_id = runners.race_id
            WHERE
                runners.id > @idStore;

            INSERT INTO dbHorseRacing.dbo.total_wins
                SELECT 
                    @idStore, COUNT(*) AS total_wins 
                FROM 
                    dbHorseRacing.dbo.historic_runners AS runners
                INNER JOIN 
                    dbHorseRacing.dbo.historic_races AS races ON races.race_id = runners.race_id 
                WHERE 
                    runners.runner_id = @runner_id
                    AND races.meeting_date < @meeting_date
                    AND runners.finish_position = 1;
        END
    END
END 

我正在用ddl和种族和跑步者表的数据样本更新问题。抱歉,它们很大...

种族采样日期: race_id Meeting_id Meeting_date课程条件race_name race_abbrev_name race_type_id race_type race_num前进方向类draw_advantage num_fences阻碍了全天候卖家索赔者学徒未婚的业余爱好者num_runners num_finishers评分group_race min_age max_age distance_pitchs_time_p_time_p_time_p -1 2941 2003-07-03纽伯里阿拉伯种族阿联酋阿拉伯国际阿拉伯种族12平1左好手1在大田野中,尤其是在非常柔软的土地上,高分最好。 NULL 0 0 0 0 0 0 0 0 8 8 NULL NULL NULL NULL 1320 0 NULL NULL NULL 2003-07-03 18:10:00.000 2003-07-03 00:00:00.000 0:00.00 0 1:14.38 74.379997253418 0x00000000000007DB

种族ddl:

[dbo].[historic_races]
(
    [race_id] [int] NOT NULL,
    [meeting_id] [int] NOT NULL,
    [meeting_date] [date] NOT NULL,
    [course] [varchar](255) NOT NULL,
    [conditions] [varchar](255) NOT NULL,
    [race_name] [varchar](255) NOT NULL,
    [race_abbrev_name] [varchar](80) NOT NULL,
    [race_type_id] [int] NOT NULL,
    [race_type] [varchar](80) NOT NULL,
    [race_num] [tinyint] NOT NULL,
    [going] [varchar](80) NULL,
    [direction] [varchar](80) NULL,
    [class] [tinyint] NULL,
    [draw_advantage] [varchar](255) NULL,
    [num_fences] [tinyint] NULL,
    [handicap] [tinyint] NULL,
    [all_weather] [tinyint] NULL,
    [seller] [tinyint] NULL,
    [claimer] [tinyint] NULL,
    [apprentice] [tinyint] NULL,
    [maiden] [tinyint] NULL,
    [amateur] [tinyint] NULL,
    [num_runners] [tinyint] NULL,
    [num_finishers] [tinyint] NULL,
    [rating] [int] NULL,
    [group_race] [int] NULL,
    [min_age] [tinyint] NULL,
    [max_age] [tinyint] NULL,
    [distance_yards] [int] NULL,
    [added_money] [float] NULL,
    [official_rating] [int] NULL,
    [speed_rating] [int] NULL,
    [private_handicap] [int] NULL,
    [scheduled_time] [datetime] NULL,
    [off_time] [datetime] NULL,
    [winning_time_disp] [varchar](20) NULL,
    [winning_time_secs] [float] NULL,
    [standard_time_disp] [varchar](20) NULL,
    [standard_time_secs] [float] NULL,
    [loaded_at] [timestamp] NULL,
PRIMARY KEY CLUSTERED 
(
    [race_id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

跑步者的采样日期:

runner_id   race_id name    foaling_date    colour  distance_travelled  form_figures    gender  age bred    cloth_number    stall_number    num_fences_jumped   long_handicap   how_easy_won    in_race_comment official_rating official_rating_type    speed_rating    speed_rating_type   private_handicap    private_handicap_type   trainer_name    trainer_id  owner_name  owner_id    jockey_name jockey_id   jockey_claim    dam_name    dam_id  sire_name   sire_id dam_sire_name   dam_sire_id forecast_price  forecast_price_decimal  starting_price  starting_price_decimal  betting_text    position_in_betting finish_position amended_position    unfinished  distance_beaten distance_won    distance_behind_winner  prize_money tote_win    tote_place  days_since_ran  last_race_type_id   last_race_type  last_race_beaten_fav    weight_pounds   penalty_weight  over_weight tack_hood   tack_visor  tack_blinkers   tack_eye_shield tack_eye_cover  tack_cheek_piece    tack_pacifiers  tack_tongue_strap   id  total_wins

1 82 401251大卫·杰克(David Jack)2010-03-21 CH 143 NULL C 2 UK 4 3 NULL NULL NULL大步向前,紧紧抓住并很快保持联系,在最后的弗隆NULL NULL NULL内保持一致并保持相同的速度32平18 Flat BJ Meehan 9262 Roldvale Limited 2311 TE Durcan 18761 NULL NULL NULL NULL NULL NULL NULL 8/1 9 9/1 10 op 8/1 tchd 10/1 5 4 NULL NULL 2 NULL 3.5 216.449996948242 NULL NULL NULL NULL NULL NULL NULL 129 NULL无效NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1267937 NULL

跑步者DDL:

[dbo].[historic_runners]
(
    [runner_id] [int] NOT NULL,
    [race_id] [int] NOT NULL,
    [name] [varchar](255) NOT NULL,
    [foaling_date] [date] NULL,
    [colour] [varchar](20) NOT NULL,
    [distance_travelled] [int] NULL,
    [form_figures] [varchar](80) NULL,
    [gender] [varchar](20) NULL,
    [age] [int] NULL,
    [bred] [varchar](4) NULL,
    [cloth_number] [int] NULL,
    [stall_number] [int] NULL,
    [num_fences_jumped] [int] NULL,
    [long_handicap] [int] NULL,
    [how_easy_won] [int] NULL,
    [in_race_comment] [text] NULL,
    [official_rating] [int] NULL,
    [official_rating_type] [varchar](80) NULL,
    [speed_rating] [int] NULL,
    [speed_rating_type] [varchar](80) NULL,
    [private_handicap] [int] NULL,
    [private_handicap_type] [varchar](80) NULL,
    [trainer_name] [varchar](80) NULL,
    [trainer_id] [int] NULL,
    [owner_name] [varchar](255) NULL,
    [owner_id] [int] NULL,
    [jockey_name] [varchar](80) NULL,
    [jockey_id] [int] NULL,
    [jockey_claim] [int] NULL,
    [dam_name] [varchar](80) NULL,
    [dam_id] [int] NULL,
    [sire_name] [varchar](80) NULL,
    [sire_id] [int] NULL,
    [dam_sire_name] [varchar](80) NULL,
    [dam_sire_id] [int] NULL,
    [forecast_price] [varchar](20) NULL,
    [forecast_price_decimal] [float] NULL,
    [starting_price] [varchar](20) NULL,
    [starting_price_decimal] [float] NULL,
    [betting_text] [text] NULL,
    [position_in_betting] [int] NULL,
    [finish_position] [int] NULL,
    [amended_position] [int] NULL,
    [unfinished] [varchar](30) NULL,
    [distance_beaten] [float] NULL,
    [distance_won] [float] NULL,
    [distance_behind_winner] [float] NULL,
    [prize_money] [float] NULL,
    [tote_win] [float] NULL,
    [tote_place] [float] NULL,
    [days_since_ran] [int] NULL,
    [last_race_type_id] [int] NULL,
    [last_race_type] [varchar](80) NULL,
    [last_race_beaten_fav] [int] NULL,
    [weight_pounds] [int] NULL,
    [penalty_weight] [int] NULL,
    [over_weight] [int] NULL,
    [tack_hood] [int] NULL,
    [tack_visor] [int] NULL,
    [tack_blinkers] [int] NULL,
    [tack_eye_shield] [int] NULL,
    [tack_eye_cover] [int] NULL,
    [tack_cheek_piece] [int] NULL,
    [tack_pacifiers] [int] NULL,
    [tack_tongue_strap] [int] NULL,
    [id] [int] NOT NULL,
    [total_wins] [int] NULL,
 CONSTRAINT [PK_RunnerRaceID] PRIMARY KEY CLUSTERED 
(
    [runner_id] ASC,
    [race_id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

所需结果-total_wins表

[dbo].[total_wins]
(
    [id] [int] NOT NULL,
    [total_wins] [int] NULL
) 

总获胜表上的“ id”对应于跑步者表的id,因此我在跑步者表中有2mill行,并带有一个称为id的唯一指示符(不要与Runner_id列混淆,该列包含重复项作为1名参赛者可以参加很多比赛)。因此,我希望最终在total_wins表中排成200万行,总获胜次数反映了该行所涉及的特定比赛日期之前赛跑者已经赢得了多少场比赛。

任何帮助将不胜感激!!我一直在为此苦苦挣扎,我什至考虑过压缩数据并使用像hadoop或mongodb这样的大数据解决方案。

谢谢 劳拉

2 个答案:

答案 0 :(得分:0)

感谢戴维斯(Davids)关于使用分组方式的建议,并且避免循环,我认为这是潜在的解决方案...

SELECT runners.id,  count(*) as total_wins 
FROM dbo.historic_runners as runners
    inner join dbo.historic_races as races on races.race_id = runners.race_id 
where races.meeting_date <  
(
    select meeting_date
    FROM dbo.historic_runners as ru
    inner join dbo.historic_races as ra on ra.race_id = ru.race_id 
    where ru.id = runners.id
    ) 
and runners.finish_position = 1
group by runners.id

感谢您对这个问题的回答,我很感激:)

答案 1 :(得分:-1)

劳拉,我不知道您数据库的确切属性,因此,我仅向您提供可以改进的一般性想法。

检测出什么很慢

您将需要测试缓慢的情况。制作数据库副本,然后尝试在不插入的情况下运行查询。然后尝试运行许多没有自定义选择的插入。这样,您将检测出写入或读取速度是否很慢。如果这些都不会使它变慢,那么表上还会发生其他事情,从而降低您的处理速度。

看看架构

看看模式是否合适,例如您的数据库是否为普通格式,如果是,则为哪种格式。如果不是正常形式,最好将其转换为正常形式。

索引

看看索引。如果读取速度很慢,那么您将需要为查询中涉及的列添加索引,但是如果您不熟悉该领域,那么请确保在阅读有关索引的文章之前,请先阅读该文章。如果写入速度很慢,请考虑删除不必要的索引,例如那些涉及查询中未使用的列的索引。

较大批次

我了解到您正在迭代每个用户的设置,但是可能没有太多的比赛需要逐一迭代。您可以按100个批次进行迭代,方法是先获得最小数量,然后再根据runners.id选择前100个查询中的最大数量。这可能会加快您的过程。请注意,在以后的步骤中,将最大值放入最小值,因此在第一次迭代后,只需一个查询即可确定极限。

最后但并非最不重要

如果写入速度很慢,那么您可以使用大量索引制作主表的副本,因此在那里一切都会很快,并且仅在那里定期复制相关的子集,并将其用作存储过程的源,因此可以进行查询会很快。这样可以提高性能,但是如果您不被迫这样做,则可以避免使用它,因为这会增加大量的维护工作,并增加许多其他的出错可能性。