我继承了一个可怕设计的表,其中数据存储如下:
Period | Identifier | Value
----------------------------------
1 | AB1 | some number
1 | AB2 | some number
1 | AB3 | some number
1 | AB4 | some number
1 | AB5 | some number
1 | A1 | some number
1 | A2 | some number
1 | A3 | some number
1 | A4 | some number
1 | A5 | some number
2 | AB1 | some number
2 | AB2 | some number
2 | AB3 | some number
2 | AB4 | some number
2 | AB5 | some number
2 | A1 | some number
2 | A2 | some number
2 | A3 | some number
2 | A4 | some number
2 | A5 | some number
我正在尝试使用SELECT语句将数据转换为这种格式:
Row # | First value | Second value
1 | A1's number | AB1's number // The next 5 rows are data from period 1
2 | A2's number | AB2's number
3 | A3's number | AB3's number
4 | A4's number | AB4's number
5 | A5's number | AB5's number
6 | A1's number | AB1's number // These 5 rows are from period 2
7 | A2's number | AB2's number
8 | A3's number | AB3's number
9 | A4's number | AB4's number
10 | A5's number | AB5's number
AB%
和A%
是该格式的两个独立ID,我认为这会轻微挫败WHERE LIKE ...
条款。我不完全确定数据可以强制进入所需的格式,但我的主管让我调查一下。
我最初的尝试,我不知道SQL代码,将是查看行号本身并使用,但正如我所说,我不确定如何沿着这条路线前进。
目前,数据位于SQL Server中,但可以使用proc sql
从SAS访问。我认为这些标准大部分都符合SQL Server,即使不支持DECLARE
。
不,我不知道以这种方式存储数据的想法是什么......
答案 0 :(得分:2)
如果标识符中的“B”仅 用于区分A类和AB类标识符,那么您只需删除该字母并加入结果:
SELECT ROW_NUMBER() OVER(ORDER BY AData.Period, AData.[Identifier]) AS [Row #]
, AData.[Identifier] AS [First Value]
, ABData.[Identifier] AS [Second Value]
FROM YourTable AData
-- Change to a LEFT JOIN if not all A's have AB's.
JOIN YourTable ABData
-- NOTE: Assumes that 'B' is the only differentiator between
-- AData and ABData's Identifier column and that it is
-- not repeated as part of the common identifier.
ON AData.[Identifier] = REPLACE(ABData.[Identifier], 'B', '')
你是完全正确的 - 它不是一个非常棒的架构 - 这可能需要全表扫描。
答案 1 :(得分:2)
如果您使用SAS,那么我只使用PROC TRANSPOSE。获取数据以包含标签变量,该变量确定将数据移动到哪个变量:
data datatable;
infile datalines dlm='|';
input
Period Identifier $ Value $;
datalines;
1 | AB1 | some number
1 | AB2 | some number
1 | AB3 | some number
1 | AB4 | some number
1 | AB5 | some number
1 | A1 | some number
1 | A2 | some number
1 | A3 | some number
1 | A4 | some number
1 | A5 | some number
2 | AB1 | some number
2 | AB2 | some number
2 | AB3 | some number
2 | AB4 | some number
2 | AB5 | some number
2 | A1 | some number
2 | A2 | some number
2 | A3 | some number
2 | A4 | some number
2 | A5 | some number
;;;
run;
data have;
set datatable;
idlabel = compress(identifier, ,'d');
byval = compress(identifier,,'kd');
run;
proc sort data=have;
by period byval;
run;
proc transpose data=have out=want;
by period byval;
id idlabel;
var value;
run;
如果由于某种原因你必须在SQL中执行它,那么最好将它作为自身的连接。你想加入期间= 1的行,并且对AB和A都压缩(标识符,'kd')= 1,所以你可以这样做:
proc sql;
create table want as
select A.period, AB.value as AB, A.value as A
from (select * from have where compress(identifier,,'d')='AB') AB,
(select * from have where compress(identifier,,'d')='A') A
where AB.period=A.period
and compress(AB.identifier,,'kd') = compress(A.identifier,,'kd');
quit;
但PROC TRANSPOSE选项可能比自联接更有效率,我认为(如果您的数据不如您所显示的那么漂亮,则更灵活)。
答案 2 :(得分:2)
忽略在特定时段内将A与AB关联一秒的棘手问题,如果数据能够以某种方式相关,我会通过在表上对其自身进行内部连接来选择您要查找的格式,从而:
SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier),
a.Value,
b.Value
FROM TableName a
INNER JOIN TableName b ON join_mechanism
ORDER BY a.Period, a.Identifier, b.Identifier
现在,要填写连接机制,显而易见的部分是a.Period = b.Period。可疑部分是一个想法,如果此文本是静态的,您可以尝试替换字符串。所以REPLACE(a.Identifier,'A','AB')= b。标识符。
因此,总而言之,你会有:
SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier),
a.Value,
b.Value
FROM TableName a
INNER JOIN TableName b ON a.Period = b.Period AND REPLACE(a.Identifier, 'A', 'AB') = b.Identifier
ORDER BY a.Period, a.Identifier, b.Identifier
注意:SELECT语句尚未经过测试,我假设您使用的是支持row_number的相对较新版本的MSSQL。