我一直在努力完成自己的数据任务,为此我很努力,因为我精通这一点,所以我尝试使用c#进行操作,但是它花了很多时间!!!因此,我决定在SQL Server中进行操作,但是看起来它仍然需要花费很长的时间,可能要花20天才能完成。
有人知道任何更有效的方式来编写我的存储过程吗?
ALTER PROCEDURE [dbo].[TotalWins]
AS
BEGIN
SET NOCOUNT ON
DECLARE @meeting_date DATE;
DECLARE @idStore INT;
DECLARE @race_idStore INT;
DECLARE @runner_id INT;
SET @idStore = 0;
SET @race_idStore = -1;
SET @runner_id = 0;
WHILE(@idStore IS NOT NULL)
BEGIN
SET @race_idStore = -1;
SET @runner_id = 0;
SELECT
@idStore = MIN(runners.id)
FROM
dbHorseRacing.dbo.historic_runners AS runners
WHERE
runners.id > @idStore;
IF @idStore IS NOT NULL
BEGIN
SELECT
@runner_id = runners.runner_id, @meeting_date = races.meeting_date
FROM
dbHorseRacing.dbo.historic_runners AS runners
INNER JOIN
dbHorseRacing.dbo.historic_races AS races ON races.race_id = runners.race_id
WHERE
runners.id > @idStore;
INSERT INTO dbHorseRacing.dbo.total_wins
SELECT
@idStore, COUNT(*) AS total_wins
FROM
dbHorseRacing.dbo.historic_runners AS runners
INNER JOIN
dbHorseRacing.dbo.historic_races AS races ON races.race_id = runners.race_id
WHERE
runners.runner_id = @runner_id
AND races.meeting_date < @meeting_date
AND runners.finish_position = 1;
END
END
END
我正在用ddl和种族和跑步者表的数据样本更新问题。抱歉,它们很大...
种族采样日期: race_id Meeting_id Meeting_date课程条件race_name race_abbrev_name race_type_id race_type race_num前进方向类draw_advantage num_fences阻碍了全天候卖家索赔者学徒未婚的业余爱好者num_runners num_finishers评分group_race min_age max_age distance_pitchs_time_p_time_p_time_p -1 2941 2003-07-03纽伯里阿拉伯种族阿联酋阿拉伯国际阿拉伯种族12平1左好手1在大田野中,尤其是在非常柔软的土地上,高分最好。 NULL 0 0 0 0 0 0 0 0 8 8 NULL NULL NULL NULL 1320 0 NULL NULL NULL 2003-07-03 18:10:00.000 2003-07-03 00:00:00.000 0:00.00 0 1:14.38 74.379997253418 0x00000000000007DB >种族ddl:
[dbo].[historic_races]
(
[race_id] [int] NOT NULL,
[meeting_id] [int] NOT NULL,
[meeting_date] [date] NOT NULL,
[course] [varchar](255) NOT NULL,
[conditions] [varchar](255) NOT NULL,
[race_name] [varchar](255) NOT NULL,
[race_abbrev_name] [varchar](80) NOT NULL,
[race_type_id] [int] NOT NULL,
[race_type] [varchar](80) NOT NULL,
[race_num] [tinyint] NOT NULL,
[going] [varchar](80) NULL,
[direction] [varchar](80) NULL,
[class] [tinyint] NULL,
[draw_advantage] [varchar](255) NULL,
[num_fences] [tinyint] NULL,
[handicap] [tinyint] NULL,
[all_weather] [tinyint] NULL,
[seller] [tinyint] NULL,
[claimer] [tinyint] NULL,
[apprentice] [tinyint] NULL,
[maiden] [tinyint] NULL,
[amateur] [tinyint] NULL,
[num_runners] [tinyint] NULL,
[num_finishers] [tinyint] NULL,
[rating] [int] NULL,
[group_race] [int] NULL,
[min_age] [tinyint] NULL,
[max_age] [tinyint] NULL,
[distance_yards] [int] NULL,
[added_money] [float] NULL,
[official_rating] [int] NULL,
[speed_rating] [int] NULL,
[private_handicap] [int] NULL,
[scheduled_time] [datetime] NULL,
[off_time] [datetime] NULL,
[winning_time_disp] [varchar](20) NULL,
[winning_time_secs] [float] NULL,
[standard_time_disp] [varchar](20) NULL,
[standard_time_secs] [float] NULL,
[loaded_at] [timestamp] NULL,
PRIMARY KEY CLUSTERED
(
[race_id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
跑步者的采样日期:
runner_id race_id name foaling_date colour distance_travelled form_figures gender age bred cloth_number stall_number num_fences_jumped long_handicap how_easy_won in_race_comment official_rating official_rating_type speed_rating speed_rating_type private_handicap private_handicap_type trainer_name trainer_id owner_name owner_id jockey_name jockey_id jockey_claim dam_name dam_id sire_name sire_id dam_sire_name dam_sire_id forecast_price forecast_price_decimal starting_price starting_price_decimal betting_text position_in_betting finish_position amended_position unfinished distance_beaten distance_won distance_behind_winner prize_money tote_win tote_place days_since_ran last_race_type_id last_race_type last_race_beaten_fav weight_pounds penalty_weight over_weight tack_hood tack_visor tack_blinkers tack_eye_shield tack_eye_cover tack_cheek_piece tack_pacifiers tack_tongue_strap id total_wins
1 82 401251大卫·杰克(David Jack)2010-03-21 CH 143 NULL C 2 UK 4 3 NULL NULL NULL大步向前,紧紧抓住并很快保持联系,在最后的弗隆NULL NULL NULL内保持一致并保持相同的速度32平18 Flat BJ Meehan 9262 Roldvale Limited 2311 TE Durcan 18761 NULL NULL NULL NULL NULL NULL NULL 8/1 9 9/1 10 op 8/1 tchd 10/1 5 4 NULL NULL 2 NULL 3.5 216.449996948242 NULL NULL NULL NULL NULL NULL NULL 129 NULL无效NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1267937 NULL
跑步者DDL:
[dbo].[historic_runners]
(
[runner_id] [int] NOT NULL,
[race_id] [int] NOT NULL,
[name] [varchar](255) NOT NULL,
[foaling_date] [date] NULL,
[colour] [varchar](20) NOT NULL,
[distance_travelled] [int] NULL,
[form_figures] [varchar](80) NULL,
[gender] [varchar](20) NULL,
[age] [int] NULL,
[bred] [varchar](4) NULL,
[cloth_number] [int] NULL,
[stall_number] [int] NULL,
[num_fences_jumped] [int] NULL,
[long_handicap] [int] NULL,
[how_easy_won] [int] NULL,
[in_race_comment] [text] NULL,
[official_rating] [int] NULL,
[official_rating_type] [varchar](80) NULL,
[speed_rating] [int] NULL,
[speed_rating_type] [varchar](80) NULL,
[private_handicap] [int] NULL,
[private_handicap_type] [varchar](80) NULL,
[trainer_name] [varchar](80) NULL,
[trainer_id] [int] NULL,
[owner_name] [varchar](255) NULL,
[owner_id] [int] NULL,
[jockey_name] [varchar](80) NULL,
[jockey_id] [int] NULL,
[jockey_claim] [int] NULL,
[dam_name] [varchar](80) NULL,
[dam_id] [int] NULL,
[sire_name] [varchar](80) NULL,
[sire_id] [int] NULL,
[dam_sire_name] [varchar](80) NULL,
[dam_sire_id] [int] NULL,
[forecast_price] [varchar](20) NULL,
[forecast_price_decimal] [float] NULL,
[starting_price] [varchar](20) NULL,
[starting_price_decimal] [float] NULL,
[betting_text] [text] NULL,
[position_in_betting] [int] NULL,
[finish_position] [int] NULL,
[amended_position] [int] NULL,
[unfinished] [varchar](30) NULL,
[distance_beaten] [float] NULL,
[distance_won] [float] NULL,
[distance_behind_winner] [float] NULL,
[prize_money] [float] NULL,
[tote_win] [float] NULL,
[tote_place] [float] NULL,
[days_since_ran] [int] NULL,
[last_race_type_id] [int] NULL,
[last_race_type] [varchar](80) NULL,
[last_race_beaten_fav] [int] NULL,
[weight_pounds] [int] NULL,
[penalty_weight] [int] NULL,
[over_weight] [int] NULL,
[tack_hood] [int] NULL,
[tack_visor] [int] NULL,
[tack_blinkers] [int] NULL,
[tack_eye_shield] [int] NULL,
[tack_eye_cover] [int] NULL,
[tack_cheek_piece] [int] NULL,
[tack_pacifiers] [int] NULL,
[tack_tongue_strap] [int] NULL,
[id] [int] NOT NULL,
[total_wins] [int] NULL,
CONSTRAINT [PK_RunnerRaceID] PRIMARY KEY CLUSTERED
(
[runner_id] ASC,
[race_id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
所需结果-total_wins表
[dbo].[total_wins]
(
[id] [int] NOT NULL,
[total_wins] [int] NULL
)
总获胜表上的“ id”对应于跑步者表的id,因此我在跑步者表中有2mill行,并带有一个称为id的唯一指示符(不要与Runner_id列混淆,该列包含重复项作为1名参赛者可以参加很多比赛)。因此,我希望最终在total_wins表中排成200万行,总获胜次数反映了该行所涉及的特定比赛日期之前赛跑者已经赢得了多少场比赛。
任何帮助将不胜感激!!我一直在为此苦苦挣扎,我什至考虑过压缩数据并使用像hadoop或mongodb这样的大数据解决方案。
谢谢 劳拉
答案 0 :(得分:0)
感谢戴维斯(Davids)关于使用分组方式的建议,并且避免循环,我认为这是潜在的解决方案...
SELECT runners.id, count(*) as total_wins
FROM dbo.historic_runners as runners
inner join dbo.historic_races as races on races.race_id = runners.race_id
where races.meeting_date <
(
select meeting_date
FROM dbo.historic_runners as ru
inner join dbo.historic_races as ra on ra.race_id = ru.race_id
where ru.id = runners.id
)
and runners.finish_position = 1
group by runners.id
感谢您对这个问题的回答,我很感激:)
答案 1 :(得分:-1)
劳拉,我不知道您数据库的确切属性,因此,我仅向您提供可以改进的一般性想法。
您将需要测试缓慢的情况。制作数据库副本,然后尝试在不插入的情况下运行查询。然后尝试运行许多没有自定义选择的插入。这样,您将检测出写入或读取速度是否很慢。如果这些都不会使它变慢,那么表上还会发生其他事情,从而降低您的处理速度。
看看模式是否合适,例如您的数据库是否为普通格式,如果是,则为哪种格式。如果不是正常形式,最好将其转换为正常形式。
看看索引。如果读取速度很慢,那么您将需要为查询中涉及的列添加索引,但是如果您不熟悉该领域,那么请确保在阅读有关索引的文章之前,请先阅读该文章。如果写入速度很慢,请考虑删除不必要的索引,例如那些涉及查询中未使用的列的索引。
我了解到您正在迭代每个用户的设置,但是可能没有太多的比赛需要逐一迭代。您可以按100个批次进行迭代,方法是先获得最小数量,然后再根据runners.id选择前100个查询中的最大数量。这可能会加快您的过程。请注意,在以后的步骤中,将最大值放入最小值,因此在第一次迭代后,只需一个查询即可确定极限。
如果写入速度很慢,那么您可以使用大量索引制作主表的副本,因此在那里一切都会很快,并且仅在那里定期复制相关的子集,并将其用作存储过程的源,因此可以进行查询会很快。这样可以提高性能,但是如果您不被迫这样做,则可以避免使用它,因为这会增加大量的维护工作,并增加许多其他的出错可能性。