在 SAS 中,只有 PROC REG、PROC LOGISTIC 和 PROC PHREG 具有 model selection 的功能,其餘諸如 PROC GENMOD、PROC CATMOD 或 PROC MIXED 都需要用「手動」的方式來挑選最佳模式。本文利用 ODS 的功能,提供一個 macro 程式讓 PROC GENMOD 可以進行自動的 model selection 動作,可大幅節省時間並且減少手動挑選模式的過程中所可能產生的錯誤。
首先必須先執行一個名叫 MdStmt 的 macro 將 PROC GENMOD 所產生的 Type 3 test 報表另存成一個新檔。程式如下:
%macro MdStmt(
resvar = /*response variable */
,expvar = /*list of explanatory variables, separated by ' ' */
,clsvar = /*classification variables in the CLASS statement separated by ' ' */
);
ods output Type3=pval(rename=source=parm);
proc genmod data=indat descending;
class S &clsvar;
model &resvar= &expvar /dist=bin link=logit type3 lrci;
repeated subject=S /type=cs corrw covb;
title "&resvar = &expvar";
run;
ods output close;
%mend MdStmt;
此 macro 包含三個參數:
- resvar:要放在 model statement 等號左邊的反應變數
- expvar:要放在 model statement 等號右邊的解釋變數
- clsvar:要放在 class statement 的類別變數
%macro MdSelect(
var= /*response variable */
,intvar= /*initial explanatory variables for full model */
,catvar= /*categorical explanatory variables */
,slstay= /*criterion for removing variable */
);
%let var=%upcase(&var);
%let intvar=%upcase(&intvar);
%let catvar=%upcase(&catvar);
%*-------------------------------------------------------------------------*;
%* Create empty dataset "step" with only one column "parm". It will be *;
%* merged with "pval" from PROC GENMOD by "parm" *;
%*-------------------------------------------------------------------------*;
proc sql;
create table step_&var (parm char(9));
quit;
%let i=1;
%do %until (&pmax<=&slstay); %if &i = 1 %then %MdStmt(resvar=&var ,expvar=&intvar, clsvar=&catvar); %*initial model; %else %do; %MdStmt(resvar=&var ,expvar=&varlist, clsvar=&catvar); %*reduced model; %end; proc sort data=step_&var; by parm; proc sort data=pval; by parm; data step_&var; merge step_&var pval; by parm; p&i=put(ProbChiSq, pvalue6.3); drop ProbChiSq ChiSq DF; run; proc sql noprint; select max(ProbChiSq) into :pmax from pval; select distinct parm into :varlist separated by ' ' from pval having ProbChiSq^=max(ProbChiSq); quit; %let i=%eval(&i+1); %end; proc print data=step_&var; title "&var: model selection process"; run; %mend MdSelect;
這程式包含四個參數:
- var:要放在 model statement 等號左邊的反應變數
- intvar:要放在 model statement 等號右邊的解釋變數
- catvar:要放在 class statement 內的類別變數
- slstay:刪除變數的準則。這個值是要拿去跟 p-value 比的。通常是設定 0.1 或 0.5。
%MdSelect(var=Y1, intvar=C1 C2 C3 M1 M2 M3 M4 M5 M6 M7, catvar=C1 C2 C3, slstay=0.5);
不過這個 macro 有個缺陷(我自己發現的)。在 %MdStmt 中,ID 變數被固定為 S,而 covariance structure 的形式被固定成 cs。同理,model statement 後面的 option 也被固定為 dist=bin 和 link=logit,表示這個模式只能拿來做最簡單的 logistic regression model with binary response。因此,可以把 %MdStmt 改成:
%macro MdStmt(
idvar= /*ID variable*/
,resvar = /*response variable */
,expvar = /*list of explanatory variables, separated by ' ' */
,clsvar = /*classification variables in the CLASS statement separated by ' ' */
,resdist= /*distribution of response variable*/
,linkfunc= /*link function*/
,covstr= /*covariance structure type*/
);
ods output Type3=pval(rename=source=parm);
proc genmod data=indat descending;
class &idvar &clsvar;
model &resvar= &expvar /dist=&resdist link=&linkfunc type3 lrci;
repeated subject=&idvar /type=&covstr corrw covb;
title "&resvar = &expvar";
run;
ods output close;
%mend MdStmt;
然後 %MdSelect 改成:
%macro MdSelect(
id= /*id variable*/
,var= /*response variable */
,intvar= /*initial explanatory variables for full model */
,catvar= /*categorical explanatory variables */
,slstay= /*criterion for removing variable */
,dist= /*distribution of response variable*/
,link= /*link function*/
,type= /*covariance structure type*/
);
%let var=%upcase(&var);
%let intvar=%upcase(&intvar);
%let catvar=%upcase(&catvar);
%*-------------------------------------------------------------------------*;
%* Create empty dataset "step" with only one column "parm". It will be *;
%* merged with "pval" from PROC GENMOD by "parm" *;
%*-------------------------------------------------------------------------*;
proc sql;
create table step_&var (parm char(9));
quit;
%let i=1;
%do %until (&pmax<=&slstay); %if &i = 1 %then %MdStmt(idvar=&id, resvar=&var ,expvar=&intvar, clsvar=&catvar, resdist=&dist, linkfunc=&link, covstr=&cov); %*initial model; %else %do; %MdStmt(idvar=&id, resvar=&var ,expvar=&varlist, clsvar=&catvar, resdist=&dist, linkfunc=&link, covstr=&cov); %*reduced model; %end; proc sort data=step_&var; by parm; proc sort data=pval; by parm; data step_&var; merge step_&var pval; by parm; p&i=put(ProbChiSq, pvalue6.3); drop ProbChiSq ChiSq DF; run; proc sql noprint; select max(ProbChiSq) into :pmax from pval; select distinct parm into :varlist separated by ' ' from pval having ProbChiSq^=max(ProbChiSq); quit; %let i=%eval(&i+1); %end; proc print data=step_&var; title "&var: model selection process"; run; %mend MdSelect;
這個調整過後的程式將可以更有彈性。
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Jing Su
Merck & Co., Inc.
UG1D-88
Po Box 1000
North Wales, PA 19454-1099
Work Phone: 267-305-6949
Email: jing_su@merck.com
Wei (Lisa) Lin
Merck & Co., Inc.
UG1D-88
Po Box 1000
North Wales, PA 19454-1099
沒有留言:
張貼留言
要問問題的人請在文章下方的intensedebate欄位留言,請勿使用blogger預設的意見表單。今後用blogger意見表單留言的人我就不回應了。