我有一个跨越2亿条记录的表,我正在尝试运行以下查询。查询尝试根据上一条记录的时间戳更新表。反正有没有让这个查询运行得更快?
UPDATE [dbo].[Location Data]
SET [timestamp_prev] =
(
SELECT [timestamp] FROM [dbo].[Location Data] newTable
WHERE [dbo].[Location Data].[RowNumber] = (newTable.[RowNumber] + 1)
AND [dbo].[Location Data].[mmsi] = newTable.[mmsi]
);
答案 0 :(得分:2)
您可以尝试使用自我加入:
protected void Page_Load(object sender, EventArgs e)
{
if (!Page.IsPostBack)
{
BindOrderList(Request.QueryString["order"]);
}
}
protected void pending(object sender, EventArgs e)
{
Response.Redirect("OrderHistory.aspx?order=pending", true);
}
protected void confirmed(object sender, EventArgs e)
{
Response.Redirect("OrderHistory.aspx?order=confirmed", true);
}
protected void rejected(object sender, EventArgs e)
{
Response.Redirect("OrderHistory.aspx?order=rejected", true);
}
如果您在连接列上有索引,则此查询可能会在您退休之前完成。
答案 1 :(得分:2)
首先,我会使用lag()
执行此操作:
with toupdate as (
select ld.*,
lag(timestamp) over (partition by mmsi order by RowNumber) as prev_timestamp
from dbo.[Location Data] ld
)
update toupdate
set timestamp_prev = prev_timetamp;
然后,我会注意到更新2亿条记录需要很长很长时间。我建议您生成一个包含所需列的新表,然后截断原始表,并重新填充它。
答案 2 :(得分:0)
如下面的内部联接可能有助于而不是像在嵌套查询中那样遍历表的每一行的表的所有行。
UPDATE oldTable
SET oldTable.[timestamp_prev] = newTable.[timestamp]
FROM [dbo].[Location Data] oldTable
INNER JOIN [dbo].[Location Data] newTable
ON oldTable.[RowNumber] = newTable.[RowNumber] + 1
AND oldTable.[mmsi] = newTable.[mmsi]
答案 3 :(得分:0)
我会尝试这样的事情:
UPDATE T1 SET
[timestamp_prev] = T2.[timestamp]
FROM [dbo].[Location Data] T1
INNER JOIN [dbo].[Location Data] T2
ON T1.RowNumber = T2.RowNumber + 1
AND T1.mmsi = T2.mmsi
WHERE T1.[timestamp_prev] IS NULL;
连接应该更有效,并且只尝试更新没有先前时间戳的记录。然后,您可以采取另一个步骤将RowNumber,MMSI和Timestamp_Prev的索引添加到表中,这样可以确保干净的索引寻求最大化效率。
像这样的简单索引应该是一个好的开始:
CREATE NONCLUSTERED INDEX ix_Location_Data_MMSI_RowNumber_Timestamp_Prev
ON dbo.[Location Data] (mmsi, RowNumber, Timestamp_Prev) INCLUDE (Timestamp);