我如何使用SELECT语句将这些数据从两列转换为三列?

时间:2012-10-25 21:32:53

标签: sql sql-server sas pivot-table

我继承了一个可怕设计的表,其中数据存储如下:

Period |  Identifier |   Value
----------------------------------
1      | AB1         | some number
1      | AB2         | some number
1      | AB3         | some number
1      | AB4         | some number
1      | AB5         | some number
1      | A1          | some number
1      | A2          | some number
1      | A3          | some number
1      | A4          | some number
1      | A5          | some number
2      | AB1         | some number
2      | AB2         | some number
2      | AB3         | some number
2      | AB4         | some number
2      | AB5         | some number
2      | A1          | some number
2      | A2          | some number
2      | A3          | some number
2      | A4          | some number
2      | A5          | some number

我正在尝试使用SELECT语句将数据转换为这种格式:

Row # | First value | Second value
1     | A1's number | AB1's number     // The next 5 rows are data from period 1
2     | A2's number | AB2's number
3     | A3's number | AB3's number
4     | A4's number | AB4's number
5     | A5's number | AB5's number
6     | A1's number | AB1's number     // These 5 rows are from period 2
7     | A2's number | AB2's number
8     | A3's number | AB3's number
9     | A4's number | AB4's number
10    | A5's number | AB5's number

AB%A%是该格式的两个独立ID,我认为这会轻微挫败WHERE LIKE ...条款。我不完全确定数据可以强制进入所需的格式,但我的主管让我调查一下。

我最初的尝试,我不知道SQL代码,将是查看行号本身并使用,但正如我所说,我不确定如何沿着这条路线前进。

目前,数据位于SQL Server中,但可以使用proc sql从SAS访问。我认为这些标准大部分都符合SQL Server,即使不支持DECLARE

不,我不知道以这种方式存储数据的想法是什么......

3 个答案:

答案 0 :(得分:2)

如果标识符中的“B”仅 用于区分A类和AB类标识符,那么您只需删除该字母并加入结果:

SELECT ROW_NUMBER() OVER(ORDER BY AData.Period, AData.[Identifier]) AS [Row #]
    , AData.[Identifier] AS [First Value]
    , ABData.[Identifier] AS [Second Value]
FROM YourTable AData
-- Change to a LEFT JOIN if not all A's have AB's.
JOIN YourTable ABData
    -- NOTE: Assumes that 'B' is the only differentiator between
    -- AData and ABData's Identifier column and that it is
    -- not repeated as part of the common identifier.
    ON AData.[Identifier] = REPLACE(ABData.[Identifier], 'B', '')

你是完全正确的 - 它不是一个非常棒的架构 - 这可能需要全表扫描。

答案 1 :(得分:2)

如果您使用SAS,那么我只使用PROC TRANSPOSE。获取数据以包含标签变量,该变量确定将数据移动到哪个变量:

data datatable;
infile datalines dlm='|';
input
Period Identifier $ Value $;
datalines;
1      | AB1         | some number
1      | AB2         | some number
1      | AB3         | some number
1      | AB4         | some number
1      | AB5         | some number
1      | A1          | some number
1      | A2          | some number
1      | A3          | some number
1      | A4          | some number
1      | A5          | some number
2      | AB1         | some number
2      | AB2         | some number
2      | AB3         | some number
2      | AB4         | some number
2      | AB5         | some number
2      | A1          | some number
2      | A2          | some number
2      | A3          | some number
2      | A4          | some number
2      | A5          | some number
;;;
run;

data have;
set datatable;
idlabel = compress(identifier, ,'d');
byval = compress(identifier,,'kd');
run;

proc sort data=have;
by period byval;
run;
proc transpose data=have out=want;
by period byval;
id idlabel;
var value;
run;

如果由于某种原因你必须在SQL中执行它,那么最好将它作为自身的连接。你想加入期间= 1的行,并且对AB和A都压缩(标识符,'kd')= 1,所以你可以这样做:

proc sql;
  create table want as 
    select A.period, AB.value as AB, A.value as A
    from (select * from have where compress(identifier,,'d')='AB') AB, 
         (select * from have where compress(identifier,,'d')='A') A
    where AB.period=A.period
    and compress(AB.identifier,,'kd') = compress(A.identifier,,'kd');
quit;

但PROC TRANSPOSE选项可能比自联接更有效率,我认为(如果您的数据不如您所显示的那么漂亮,则更灵活)。

答案 2 :(得分:2)

忽略在特定时段内将A与AB关联一秒的棘手问题,如果数据能够以某种方式相关,我会通过在表上对其自身进行内部连接来选择您要查找的格式,从而:

SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier), 
       a.Value, 
       b.Value 
FROM TableName a 
  INNER JOIN TableName b ON join_mechanism 
ORDER BY a.Period, a.Identifier, b.Identifier

现在,要填写连接机制,显而易见的部分是a.Period = b.Period。可疑部分是一个想法,如果此文本是静态的,您可以尝试替换字符串。所以REPLACE(a.Identifier,'A','AB')= b。标识符。

因此,总而言之,你会有:

SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier), 
       a.Value, 
       b.Value 
FROM TableName a 
  INNER JOIN TableName b ON a.Period = b.Period AND REPLACE(a.Identifier, 'A', 'AB') = b.Identifier 
ORDER BY a.Period, a.Identifier, b.Identifier

注意:SELECT语句尚未经过测试,我假设您使用的是支持row_number的相对较新版本的MSSQL。