对不起,我的英语不好。
使用SAS,我试图替换一个表中的数据,我们称其为t1。作为替代,我将比较t1列1和t2列1。如果有匹配项,我想使用t2列2的值。
表1中有很多列,并且相关列中的数据可以重复。表2只有两列,第一列只有唯一的值,并且将与表1进行比较。之后,我将使用第二列的值。
出于某种原因,我正在生成笛卡尔积。
proc sql;
create view
v1 as
select
t2.c2, (final result)
t1.c10, (not relevant to problem)
SUM(t1.c11) (not relevant to problem)
from
_outres.table1 t1
left join
_outres.table2 t2
on
t1.c1=t2.c1 (comparing the tables)
where
t1.c10= "criteria"
group by
t2.c2,
t1.c10
;run;quit;
如果是Excel,我会这样解决:
Table 1
column 1
A
A
A
B
B
C
C
Table 2
Column 1 column 2
A AA
B BB
C CC
= vlookup(表1 column1,表2、2,否)
Result:
Table 1
column 1
AA
AA
AA
BB
BB
CC
CC
------------------编辑-----------------
@DCR,根据您的回复,这是我用来测试的代码。我做了一些小的更改以更好地反映我的数据和表。这可以按预期工作,但是我无法将其转换为原始代码。
data tttttt1;
input col1 $ col11 col10 $;
datalines;
A 10 critA
A 12 critA
A 13 critA
A 13 critB
B 11 critA
B 41 critA
B 19 critA
C 20 critA
C 55 critA
;
run;
data tttttt2;
input col1 $ col2 $ ;
datalines;
A AA
B BB
C CC
;
run;
proc sql noprint;
create table tttttt3 as
select b.col2, SUM(a.col11), a.col10
from (select * from tttttt1) as a
left join (select * from tttttt2) as b
on a.col1 = b.col1
where a.col10 = "critA"
group by b.col2, a.col10
;quit;
期望和结果相同:
AA 35 critA
BB 71 critA
CC 75 critA
答案 0 :(得分:0)
SAS具有自定义格式形式的独特功能。格式很像VLOOKUP那样将源值映射到目标值。
使用FORMAT
语句将格式与变量关联。
proc format;
value $MyFormat
'A' = 'AA'
'B' = 'BB'
'C' = 'CC'
;
run;
data have;
input col1 $ @@;
col1_formatted_value = put(col1,$MyFormat.); * typically don't have to do this;
datalines;
A A A B B C C D D A
run;
proc print data=have;
title "Data rendered per attributes associated with variables in data set metadata";
run;
proc print data=have;
title "col1 Format applied at step time";
format col1 $MyFormat.;
run;
* col1 format attribute saved with data set;
data have2;
input col1 $ @@;
format col1 $MyFormat.;
datalines;
A A A B B C C D D A
run;
proc print data=have2;
title "Data rendered per format attributes associated with variables (in data set metadata)";
run;
SAS格式也可以直接从数据构造:
data formatMappingData;
input source $ target $;
fmtname = "$MyFormatFromData";
start = source;
label = target;
datalines;
A AA!
B BB!
C CC!
;
run;
proc format cntlin=formatMappingData;
run;
proc print data=have2;
title "Data rendered per format attributes associated with variables (in data set metadata)";
format col1 $MyFormatFromData.;
run;
答案 1 :(得分:0)
我认为您可能正在使用proc sql寻找左联接。请尝试以下操作:
data t1;
input col1 $ ;
datalines;
A
A
A
B
B
C
C
;
run;
data t2;
input col1 $ col2 $ ;
datalines;
A AA
B BB
C CC
;
run;
proc sql noprint;
create table t3 as
select b.col2
from (select * from t1) as a
left join (select * from t2) as b
on a.col1 = b.col1;
quit;
答案 2 :(得分:0)
我找到了解决方法!
感谢大家,所有答案,他们给了我一些见识。
@nvioli和@DCR给了我巨大的见解。我正在努力了解所生成的笛卡尔积。我计算了行数,发现结果与原始t1表相比行数相同。但是总和值显然是错误的。所以我知道,以某种方式,我的代码是在每行中插入总和,而不是“ group by”的小计。
我用最简单的方法解决了它:我将视图分为两个不同的视图。第一个将进行分组和求和,因为此代码的较旧版本正确执行了该操作。第二个视图仅需简单选择即可保留联接并更改数据。最终代码是这样的(简化版本,如原始示例所示):
/*view to group and sum columns from t1*/
proc sql;
create view
v1 as
select
t1.c1, (column that will be substitute later)
t1.c10, (not relevant to problem, only to show the "criteria"/group by)
SUM(t1.c11) (not relevant to problem, only to show sum)
from
_outres.table1 t1
where
t1.c10= "criteria"
group by
t1.c1,
t1.c10
;quit;run;
之后:
/*view to substitute the desired column from t1 (now v1) */
proc sql;
create view
v2 as
select
t2.c2, (column with new data)
t1.c10, (now already grouped)
Sum_of_t1.c11 (now already summed)
from
v1
left join
t2
on
v1.c1 = t2.c1 (comparing view from t1 with t2)
;quit;run;