我需要从CSV文件解析数据(~60000行)并将它们写入MSSQL表(数据是日期/时间和值,这是一个十进制数)。 每天我都会得到一个这样的CSV文件。 问题是,在我每天获得的CSV文件中,我有过去5天的数据,这意味着我有过去几天已经写过的日期数据,但是我需要将其替换为来自文件。
我试图在两种方法之间做出决定: 批量删除我需要在获取新CSV文件时重新写入的旧数据,并插入INSERT,然后根据日期和时间以及ID查找每条记录并更新它。
1.什么是更好的做法,可以减少数据库中的碎片和维护问题?
如果要在两者之间进行选择,我更喜欢保持我的数据库在高性能状态下保持良好状态,因为无论如何都会在夜间写入文件。
编辑:如果我在批量删除和插入新数据后添加了每日重建索引的维护计划,这是否足以避免碎片问题,或者有什么我可以我不见了?
答案 0 :(得分:1)
更快更好更好的方法是删除所有旧数据,使用SSIS导入数据或在没有SSIS的情况下批量插入,然后重建碎片索引。以script为例。
答案 1 :(得分:0)
我将插入您CSV文件中的所有数据并删除重复数据。
以下代码可帮助您删除重复项。 我希望它可以帮助你:)
delete b from your_table c join
(SELECT max(a.id) id, a.date
FROM your_table a
GROUP BY a.date
having count(0) > 1
) as b
on c.date = b.date
and c.id <> b.id
答案 2 :(得分:0)
这是一种使用登台表和解析的CSV数据的MERGE技术。或者,您可以使用表值参数而不是登台表源。
关于碎片问题,它主要取决于在现有目标表日期范围内插入的新行数。如果没有该范围内的新行,碎片是无关紧要的(如下面的脚本所示,不足3%。如果碎片成为问题,您可以在ETL之后执行索引REBUILD
或REORGANIZE
。
CREATE TABLE dbo.Test(
TestDateTime datetime2(0) NOT NULL
CONSTRAINT PK_Test PRIMARY KEY
, TestData int NOT NULL
);
CREATE TABLE dbo.TestStaging(
TestDateTime datetime2(0) NOT NULL
CONSTRAINT PK_TestStaging PRIMARY KEY
, TestData int NOT NULL
);
GO
--load 10 days into main table (61710 per day)
WITH
t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
,t256K AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) - 1 AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t4 AS c)
INSERT INTO dbo.Test WITH(TABLOCKX) (TestDateTime, TestData)
SELECT DATEADD(second, num*7, CAST('2015-07-01T00:00:00' AS datetime2(0))), num
FROM t256K
WHERE num <= 123420;
GO
--load 4 most recent days with new values plus 1 new day into staging table
WITH
t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
,t256K AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) - 1 AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t4 AS c)
INSERT INTO dbo.TestStaging WITH(TABLOCKX) (TestDateTime, TestData)
SELECT DATEADD(second, num*7, CAST('2015-07-07T00:00:06' AS datetime2(0))), num
FROM t256K
WHERE num <= 61710;
GO
--show fragmentation before MERGE
SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Test'), NULL, NULL, 'DETAILED');
GO
MERGE dbo.Test AS target
USING dbo.TestStaging AS source ON
source.TestDateTime = target.TestDateTime
WHEN MATCHED THEN
UPDATE SET TestData = source.TestData
WHEN NOT MATCHED BY target THEN
INSERT (TestDateTime, TestData) VALUES (source.TestDateTime, source.TestData);
GO
--show fragmentation after MERGE
SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Test'), NULL, NULL, 'DETAILED');
GO