我在SQL Server中有一个表,允许用户更改员工的详细信息。每次在EMPLOYEE_HIST
表中放置新记录。只有EMP_ID
对员工保持不变,所有其他细节都可以修改。
还有一个SEQ_NO
列,用于维护所创建条目的顺序。
EMPLOYEE_HIST :
SEQ_NO EMP_ID SOME_VAL1 SOME_VAL2
1 E1 V11 V21 (initial value of this employee)
2 E2 V12 V22 (initial value of this employee)
3 E3 V13 V23 (initial value of this employee)
4 E2 V00 V22
5 E1 V01 V21
6 E2 V02 V22
7 E4 V00 V00 (initial value of this employee)
我想要一个查询,它会让我对特定员工进行更改,例如
EMP_ID SOME_VAL1_OLD SOME_VAL1_NEW SOME_VAL2_OLD SOME_VAL2_NEW
E1 V11 V01 V21 V21
E2 V12 V00 V22 V22
E2 V00 V02 V22 V22
更新
用户n
次也可以修改员工详细信息,每次更改时,结果集中应该有一行。
请帮忙。
修改: 我最终决定使用LAG功能。它会像这样工作:
SELECT *,ROW_NUMBER() OVER(PARTITION BY EMP_ID,CHANGE_NO ORDER BY EMP_ID,CHANGE_NO,SEQ_NO)
FROM(
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1)
OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL1 AS NEW_VAL, '1' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL UNION ALL
SELECT * FROM EMPLOYEE_HIST( SELECT LAG(SOME_VAL1) OVER(PARTITION BY EMP_ID ORDER BY EMP_ID,SEQ_NO) AS OLD_VAL, SOME_VAL2 AS NEW_VAL, '2' AS CHANGE_NO) T
WHERE OLD_VAL<>NEW_VAL) TEMP
但是在包含300万条记录的表上获取总共500行的性能非常慢。请提出一些建议,以提高分拣成本。
答案 0 :(得分:1)
如果您使用的是2008或更新版本,则可以使用带有Window功能的CTE:
;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
, EMP_ID
, SOME_VAL1
, SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT e.EMP_ID
, s2.SOME_VAL1 [SOME_VAL1_OLD]
, s1.SOME_VAL1 [SOME_VAL1_NEW]
, s2.SOME_VAL2 [SOME_VAL2_OLD]
, s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN r AS s1 ON e.EMP_ID = s1.EMP_ID and s1.rank = 1 --the last change
LEFT JOIN r AS s2 ON e.EMP_ID = s2.EMP_ID and s2.rank = 2 --the second to last change
如果您想要所有的更改,而不仅仅是前两个,那么您应该能够做到这样的事情:
;WITH r AS (
SELECT RANK() OVER (PARTITION BY EMP_ID ORDER BY SEQ_NO DESC) [rank]
, EMP_ID
, SOME_VAL1
, SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT e.EMP_ID
, s2.SOME_VAL1 [SOME_VAL1_OLD]
, s1.SOME_VAL1 [SOME_VAL1_NEW]
, s2.SOME_VAL2 [SOME_VAL2_OLD]
, s1.SOME_VAL2 [SOME_VAL2_NEW]
FROM (SELECT DISTINCT EMP_ID FROM EMPLOYEE_HIST) AS e
LEFT JOIN (r AS s1 --the change
INNER JOIN r AS s2 ON s1.EMP_ID = s2.EMP_ID and s2.rank = s1.rank + 1) --previous value
ON e.EMP_ID = s1.EMP_ID
这应该枚举所有更改,直到遇到原始值。
答案 1 :(得分:0)
使用不同的数据模型可能会更好。您可以拥有一个包含相同数据结构的表EMPLOYEE_HIST_OLD。这将允许您存档以前的数据(即使使用时间戳和/或序列号),保持EMPLOYEE_HIST表的大小更小,没有定期引用的数据,等等。这将允许基本的连接语句两个表之间。
然后,我建议您使用EMPLOYEE_HIST_OLD记录的时间戳来查找最近的修改,然后将这些记录连接回当前记录。这只会向您显示已更改的记录。如果您愿意,可以将EMPLOYEE_HIST_OLD上的查询限制为只返回一条记录(最近的记录)。 SQL query to get most recent row for each instance of a given key
如果必须保留所有内容的相同EMPLOYEE_HIST表并使用序列号方法,您可能希望使用count()查找特定Employee ID的更改记录,并按序列号返回值ORDERED。您还可以将查询限制为计数为&gt;的员工。 1.然后,您将在表格中垂直查看数据。要将值解析为单独的列(如VAR1_OLD和VAR1),基本上只需要读取最后两个值并从中创建一个记录。尝试水平查看数据时,您将失去所有更改的可见性。可能有不止一个历史变化。要从水平查看记录,需要在从查询返回数据后在SQL之外进行一些数组操作。
答案 2 :(得分:0)
您可以使用CTE通过EMP_ID获取分区的行号。然后将其连接到自身,其中行号偏移1。
;WITH PartitionedRows
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY SEQ_NO) AS RowID, EMP_ID, SOME_VAL1,SOME_VAL2
FROM EMPLOYEE_HIST
)
SELECT a.EMP_ID,b.SOME_VAL1 AS SOME_VAL1_OLD,a.SOME_VAL1 AS SOME_VAL1_NEW,b.SOME_VAL2 AS SOME_VAL2_OLD,a.SOME_VAL2 AS SOME_VAL2_NEW
FROM PartitionedRows a
LEFT JOIN PartitionedRows b ON a.EMP_ID = b.EMP_ID AND a.RowID = (b.RowID + 1)
WHERE b.RowID IS NOT NULL