我正在尝试对输入数据进行对数,平方,三次和对数奇数变换,以详尽地概述单变量回归中性能最佳的变换
我在具有1,000个变量的数据集上尝试了以下代码-它返回错误/内存不足或根本无法执行。使用数组以这种方式转换变量时是否有任何限制?
/*Create a table for reference*/
DATA input_data;
ARRAY var_[*] var_1-var_1000;
DO i = 1 to 1000;
DO i = 1 to 1000;
var_(i)= i*j;
output;
END;
END;
RUN;
/*Log, square, cubic, logit transform all variables*/
DATA input_transform;
SET input_data;
ARRAY var[*] var_1-var_1000;
ARRAY log[*] log_1-log_1000;
ARRAY logit[*] logit_1-logit_1000;
ARRAY sq[*] sq_1-sq_1000;
ARRAY cubic[*] cubic_1-cubic_1000;
DO i = 1 to 1000;
log(i) = log(var(i));
logit(i) = log((var(i))/(1-var(i)));
sq(i) = var(i)**2;
cubic(i) = var(i)**3;
END;
RUN;
一个具有5000个变量的新数据集,每个变量都有各自的变换
答案 0 :(得分:1)
您正在将I
用作两个或两个嵌套do循环的索引变量。可能是把他们搞砸了。
您的第一个数据步骤是编写1,000,000个对1,002个变量的观察,仅填充“数组”的左下三角形。您是否真的要在循环中使用OUTPUT
语句?
答案 1 :(得分:0)
从理论上讲,只要您的代码正确,就没有问题。这是一个示例和日志。
option notes;
%let size=1000;
/*Create a table for reference*/
DATA input_data;
ARRAY var_[*] var_1-var_&size.;
DO i = 1 to &size.;
DO j = 1 to &size.;
var_(j)= i*j;
END;
output;
END;
RUN;
/*Log, square, cubic, logit transform all variables*/
DATA input_transform;
SET input_data;
ARRAY _var[*] var_1-var_&size.;
ARRAY _log[*] log_1-log_&size.;
ARRAY _logit[*] logit_1-logit_&size.;
ARRAY _sq[*] sq_1-sq_&size.;
ARRAY _cubic[*] cubic_1-cubic_&size.;
DO i = 1 to &size.;
_log(i) = log(_var(i));
_logit(i) = sqrt(_var(i));
_sq(i) = _var(i)**2;
_cubic(i) = _var(i)**3;
END;
RUN;
和日志:
1576 option notes;
1577 %let size=1000;
1578
1579 /*Create a table for reference*/
1580 DATA input_data;
1581 ARRAY var_[*] var_1-var_&size.;
1582
1583 DO i = 1 to &size.;
1584 DO j = 1 to &size.;
1585 var_(j)= i*j;
1586 END;
1587 output;
1588 END;
1589 RUN;
NOTE: The data set WORK.INPUT_DATA has 1000 observations and 1002
variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
1590
1591 /*Log, square, cubic, logit transform all variables*/
1592 DATA input_transform;
1593 SET input_data;
1594 ARRAY _var[*] var_1-var_&size.;
1595 ARRAY _log[*] log_1-log_&size.;
1596 ARRAY _logit[*] logit_1-logit_&size.;
1597 ARRAY _sq[*] sq_1-sq_&size.;
1598 ARRAY _cubic[*] cubic_1-cubic_&size.;
1599
1600 DO i = 1 to &size.;
1601 _log(i) = log(_var(i));
1602 _logit(i) = sqrt(_var(i));
1603 _sq(i) = _var(i)**2;
1604 _cubic(i) = _var(i)**3;
1605 END;
1606 RUN;
NOTE: There were 1000 observations read from the data set
WORK.INPUT_DATA.
NOTE: The data set WORK.INPUT_TRANSFORM has 1000 observations and 5002
variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.10 seconds