公告

[公告]
2014/01/17
由於已經是faculty的關係,不太有足夠時間寫部落格。因此更新的速度會相當緩慢。再加上近幾年來SAS GLOBAL FORUM沒有出現讓我覺得驚艷的技術文件,所以能分享的文章相對也減少許多。若有人推薦值得分享的SAS技術文件,請利用『問題討論區』告知。

2013/07/19
臉書留言板的功能因為有不明原因故障,因此特此移除。而intensedebate的留言板因管理不易,也一併移除。目前已經開啟內建的 G+ 留言系統,所以請有需要留言的朋友,可直接至『問題討論區』裡面留言。


2008年2月1日 星期五

Customizing Output for Regression Analyses Using ODS and DATA Step

原文載點:http://www.nesug.info/Proceedings/nesug07/po/po22.pdf

在配飾模式時,SAS 功力比較好的人可能會用 ODS 把輸出報表弄得好看一點。不過有人覺得好還要更好,比方說能夠把不同等級的 p-value 標上不同的顏色,或者是把表格欄位名稱弄成粗體字等等。Zhenyi Xue 在 NESUG 2007 發表了數個 macro 讓使用者能夠把 linear regression model、logistic regression model 和 Cox PH model 的輸出報表弄得更漂亮。

LINEAR REGRESSION

此處有三個 macro 可以用,分別是:
/*******************************************
*Univariate Linear Regression with PROC GLM*
*******************************************/
%Macro GLMMacro_Uni(X0);
*Use ODS in GLM procedure to generate datasets containing model fitting statistics;
proc glm data=&Dat;
model &Response =&X0 /solution;
ods output ParameterEstimates=dPara;
ods output NObs=dNum;
ods output FitStatistics=dRsq;
quit;

*Use DATA steps to generate final dataset dGLMAll for future output;
data dNum2; set dNum(rename=(NObsUsed=N_of_Obs)); if _N_=1 then delete;
length _Variable $25.; _Variable="&X0"; keep _Variable N_of_Obs;
data dPara2; set dPara; if _N_=1 then delete;
length _Variable $25.; _Variable=Parameter; drop Dependent tValue Parameter;
data dRsq2; set dRsq;
length _Variable $25.; _Variable="&X0"; keep _Variable RSquare;
run;

proc sort data=dRsq2; by _Variable;
proc sort data=dPara2; by _Variable;
proc sort data=dNum2; by _Variable;
data dPara3; merge dPara2 dRsq2 dNum2; by _Variable; run;
data dGLMAll; set dGLMAll dPara3(rename=(Probt=P_Value));
CI_lower = estimate - 1.96 * stderr;
CI_upper = estimate + 1.96 * stderr;
Variable=_Variable;
run;
%mend GLMMacro_Uni;

/*********************************************
*Multivariate Linear Regression with PROC GLM*
*********************************************/
%Macro GLMMacro(X0);
*Use ODS in GLM procedure to generate datasets containing model fitting statistics;
proc glm data=&Dat;
model &Response =&X0 /solution;
ods output ParameterEstimates=dPara;
ods output NObs=dNum;
ods output FitStatistics=dRsq;
quit;

*Use DATA steps to generate final dataset dGLMAll for future output;
data dNum2; set dNum(rename=(NObsUsed=N_of_Obs));
if _N_=1 then delete; keep N_of_Obs;
data dRsq2; set dRsq; keep RSquare;
data dNumR; merge dNum2 dRsq2;
data dPara2; set dPara(rename=(Parameter=Variable Probt=P_Value));
if _N_=1 then delete;
CI_lower = estimate - 1.96 * stderr;
CI_upper = estimate + 1.96 * stderr;
drop Dependent tValue;
data dGLMAll; set dPara2 dNumR; run;
%mend GLMMacro;


/*********************************************
*Generate Output File with ODS and PROC PRINT*
*********************************************/
%macro sum_glm(
Dat=, /*Dataset Name, Exp: work.finaldata*/
Response=, /*Continuous Dependent Variable, Exp: BloodPressure*/
Predictor=, /*List of Independent Variables, Exp: Age Male Hx_MI*/
Type= /*Uni- or Multi-variate Analysis, only can enter: Uni or Multi*/);
*FORMAT procedure for traffic lighted p-value output;
proc format; value pf 0-0.01="red" 0.01-0.05="orange" 0.05-0.2="green" other="";

options nocenter nodate nonumber missing=' ';
data dGLMAll; set _NULL_; run;

*Use DO-WHILE loop to separate the predictor variables and invoke the GLMMacro;
%let I=0;
%do %while (%scan(&Predictor, &I+1, %str( )) ne %str( ));
%let I=%eval(&I+1);
%end;
%if &Type=Uni %then %do;
%do Z=1 %to &I;
%GLMMacro_Uni(%scan(&Predictor, &Z, %str( )))
%end;
%end;
%if &Type=Multi %then %do;
%GLMMacro(&Predictor)
%end;

*Use PRINT procedure and ODS to generate the output file;
ods pdf style=journal startpage=no file="&folder..\&Type.variate linear.pdf";
proc print data=dGLMAll split=' ' noobs Style(HEADER) = {font_weight=bold};
label Variable='Variable Name' Estimate='Regression Coefficient'
P_Value='P Value' CI_lower='Lower CI' CI_upper='Upper CI'
N_of_Obs='Sample Size' RSquare='R Square';
var Variable N_of_Obs RSquare Estimate StdErr CI_lower CI_upper;
var P_Value / style={foreground=pf.};
format Estimate StdErr P_Value CI_lower CI_upper 5.3 RSquare 4.2;
title1 "&Type.variate Generalized Linear Model on &Response";
title2 "Data folder=&folder";
title3 'Predictors are either continuous or coded as 0/1 categorical(ref=0)';
run;
ods pdf close;
%mend sum_glm;

第一個和第二個程式是分別拿來配飾 simple linear regression model 和 multiple linear regression model。使用者只要把 response variable 和 predictor variable 分別設定在 Response 和 X0 這兩個巨集參數裡面即可。這兩個程式將一些基本需要拿來製作報表的數據另存成新的資料集,然後第三個程式會在叫出這些資料集來繪製表格。等到執行第三個程式時,會自動去呼叫前面這兩個程式來輸出資料。裡面有四個巨集參數:
1. Dat:指定資料集檔名。
2. Response:指定 response variable。
3. X0:指定independent variable。
4. Type:指定要跑 simple linear regression 還是 multiple linear regression。如果是前者,則請輸入「Uni」,如果是後者,則請輸入「Multi」。

以下是兩個範例:
%sum_glm(dat=nesug07, response=LVEF,  predictor=Male Age Hx_MI NumVes Hx_CAD Hx_CRI Hypertension, Type=Uni) ;

%sum_glm(dat=nesug07, response=LVEF, predictor=Male Age Hx_MI NumVes Hx_CAD Hx_CRI Hypertension, Type=Multi) ;


第一個範例是針對每一個不同的 independent variable 去配飾不同的 simple linear regression model,因此總共會有七個結果,然後 %sum_glm 再把這七個結果合併成一個報表,如下所示:


第二個範例則是配飾一個 multiple linear regression model。結果如下所示:


根據原始程式碼內的設定,p-value 小於 0.01 的值都會被標記成紅色。0.01 到 0.05 則標記成澄色,0.05 到 0.2 則標記成綠色,大於 0.2 則就維持黑色。如果想要改變顏色或這是 p-value 的顏色區分點,則可以去改第三個程式內標記成紅色的那個區塊(也就是 PROC FORMAT 那一行)。

LOGISTIC REGRESSION

Logistic regression model 同樣也有三個 macro:
/**************************************************
*Univariate Logistic Regression with PROC LOGISTIC*
**************************************************/
%Macro LogisMacro_Uni(X0)/store;
*Use ODS in LOGISTIC procedure to generate new datasets which contain model fitting
statistics;
proc logistic descending data=&Dat;
model &Response =&X0/ CLODDS=Wald;
ods output ParameterEstimates=dPara;
ods output CLOddsWald=dCIWald;
ods output Nobs=dNum;
ods output ResponseProfile=dEvent;
quit;

*Use DATA steps to generate final dataset dLogisAll for future output;
data dNum2; set dNum; if _N_ = 2;
length _Variable $25.;
N_of_Obs=N; _Variable="&X0"; keep _Variable N_of_Obs;
data dEvent2; set dEvent; if _N_ = 1;
length _Variable $25.;
N_of_Events=Count; _Variable="&X0"; keep _Variable N_of_Events;
data dPara2; set dPara; if _N_=1 then delete;
length _Variable $25.; _Variable=Variable;
drop DF WaldChisq Variable;
data dCIWald2; set dCIWald;
length _Variable $25.; _Variable=Effect;
drop Unit Effect;
run;

proc sort data=dCIWald2; by _Variable;
proc sort data=dPara2; by _Variable;
proc sort data=dNum2; by _Variable;
proc sort data=dEvent2; by _Variable;
data dPara3; merge dPara2 dCIWald2 dNum2 dEvent2; by _Variable;
data dLogisAll; set dLogisAll dPara3; Variable=_Variable; run;
%mend LogisMacro_Uni;

/****************************************************
*Multivariate Logistic Regression with PROC LOGISTIC*
****************************************************/
%Macro LogisMacro(X0);
*Use ODS in LOGISTIC procedure to generate new datasets which contain model fitting
statistics;
proc logistic descending data=&Dat;
model &Response =&X0/ CLODDS=Wald;
ods output ParameterEstimates=dPara;
ods output CLOddsWald=dCIWald;
ods output Nobs=dNum;
ods output ResponseProfile=dEvent;
quit;

*Use DATA steps to generate final dataset dLogisAll for future output;
data dPara2; set dPara; if _N_=1 then delete; drop DF WaldChisq;
data dNum2; set dNum; if _N_ = 2; N_of_Obs=N; keep N_of_Obs;
data dEvent2; set dEvent; if _N_ = 1; N_of_Events=Count; keep N_of_Events;
data dNumEvn; merge dNum2 dEvent2;
data dCIWald2; set dCIWald(rename=(Effect=Variable)); drop Unit;
run;
proc sort data=dCIWald2; by Variable;
proc sort data=dPara2; by Variable;
data dPara3; merge dPara2 dCIWald2; by Variable;run;
data dLogisAll; set dPara3 dNumEvn; run;
%mend LogisMacro;

/*********************************************
*Generate Output File with ODS and PROC PRINT*
*********************************************/
%macro sum_logistic(
Dat=, /*Dataset Name, Exp: work.nesug07*/
Response=, /*Dependent Variable, Exp: Death*/
Predictor=,/*List of Independent Variables, Exp: Age Male Hx_MI*/
Type= /*Uni- or Multi-variate Analysis, only can enter: Uni or Multi*/);
*FORMAT procedure for traffic lighted p-value output;
proc format; value pf 0-0.01="red" 0.01-0.05="orange" 0.05-0.2="green" other="";

options nocenter nodate nonumber missing=' ';
data dLogisAll; set _NULL_; run;

*Use DO-WHILE loop to separate the predictor variables and invoke the LogisMacro;
%let I=0;
%do %while (%scan(&Predictor, &I+1, %str( )) ne %str( ));
%let I=%eval(&I+1);
%end;
%if &Type=Uni %then %do;
%do Z=1 %to &I; %LogisMacro_Uni(%scan(&Predictor, &Z, %str( ))) %end;
%end;
%if &Type=Multi %then %do;
%LogisMacro(&Predictor)
%end;

*Use PRINT procedure and ODS to generate the output file;
ods pdf style=journal startpage=no file="&folder..\&Type.variate logistic.pdf";
proc print data=dLogisAll split=' ' noobs Style(HEADER) = {font_weight=bold};
label Variable='Variable Name' Estimate='Regression Coefficient' ProbChiSq='P
Value' LowerCL='Lower OR' UpperCL='Upper OR' N_of_Obs='Sample Size'
N_of_Events='No. Events';
var Variable N_of_Obs N_of_Events Estimate StdErr;
var ProbChisq / style={foreground=pf.};
var OddsRatioEst LowerCL UpperCL ;
format Estimate StdErr ProbChisq 5.3 OddsRatioEst LowerCL UpperCL 4.2;
title1 "&Type.variate Logistic Model on &Response. (1)";
title2 "Data folder=&folder";
title3 'Predictors are either continuous or coded as 0/1 categorical(ref=0)';
run;
ods pdf close;
%mend sum_logistic;

用法幾乎是和前面的 %sum_glm 一模一樣,連巨集參數的名稱和用法也都一模一樣。直接來看範例:
%sum_logistic(dat=nesug07, response=TVR_MACE,
predictor=Male Age Hx_MI NumVes Hx_CAD Hx_CRI Hypertension LVEF, Type=Uni);

%sum_logistic(dat=nesug07, response=TVR_MACE,
predictor=Male Age Hx_MI NumVes Hx_CAD Hx_CRI Hypertension LVEF, Type=Multi);


與之前的 %sum_glm 比較起來沒有任何差別。做出來的報表也很相似:


同樣地,如果想要變動 p-value 的顏色,請到 %sum_logistic 裡面的紅色 PROC FORMAT 部分進行修改。

COX REGRESSION (PROPORTIONAL HAZARDS MODEL)

此處也是有三個巨集程式:
%Macro  CoxMacro_Uni(X0);
*Use ODS in PHREG procedure to generate datasets for model fitting statistics;
proc phreg data=&Dat;
model &Time * &Status(0) =&X0/ RL;
assess var=(&X0) PH/resample;
ods output ParameterEstimates=dPara;
ods output CensoredSummary=dCens;
ods output ProportionalHazardsSupTest=dPH;
quit;

*Use DATA steps to generate final dataset dCoxAll for future output;
data dCens2; set dCens;
length _Variable $25.; _Variable="&X0"; keep _Variable Total Event PctCens;
data dPara2; set dPara;
length _Variable $25.; _Variable=Variable; drop Variable;
data dPH2; set dPH;
length _Variable $25.; _Variable=Variable; drop Variable;
run;

proc sort data=dPara2; by _Variable;
proc sort data=dCens2; by _Variable;
proc sort data=dPH2; by _Variable;
data dPara3; merge dPara2(rename=(ProbChisq=p_value HazardRatio=HR))
dCens2 dPH2(rename=(pvalue=STPHA_p));
by _Variable;
drop DF Chisq MaxAbsValue Replications Seed;
data dCoxAll; set dCoxAll dPara2; Variable=_Variable; run;
%mend CoxMacro_Uni;

/********************************************
*Multivariate Cox Regression with PROC PHREG*
********************************************/
%Macro CoxMacro(X0);
*Use ODS in PHREG procedure to generate datasets for model fitting statistics;
proc phreg data=&Dat;
model &Time * &Status(0) =&X0/ RL;
ods output ParameterEstimates=dPara;
ods output CensoredSummary=dCens;
quit;

*Use DATA steps to generate final dataset dCoxAll for future output;
data dCens2; set dCens; keep Total Event PctCens;
data dPara2; set dPara(rename=(ProbChisq=p_value HazardRatio=HR) drop=(DF Chisq));
data dCoxAll; set dPara2 dCens2; run;
%mend CoxMacro;


/*********************************************
*Generate Output File with ODS and PROC PRINT*
*********************************************/
%macro sum_cox(
Dat=, /*Dataset Name, Exp: work.nesug07*/
Status=, /*Variable for Censoring Status(0 is censoring), Exp: Death*/
Time=, /*Time to Event Variable, Exp: FU_Month*/
Predictor=,/*List of Independent Variables, Exp: Age Male Hx_MI*/
Type= /*Uni- or Multi-variate Analysis, only can enter: Uni or Multi*/);

*FORMAT procedure for traffic lighted p-value output;
proc format; value pf 0-0.01="red" 0.01-0.05="orange" 0.05-0.2="green" other="";

options nocenter nodate nonumber missing=' ';
data dCoxAll; set _NULL_; run;

*Use DO-WHILE loop to separate the predictor variables;
%let I=0;
%do %while (%scan(&Predictor, &I+1, %str( )) ne %str( ));
%let I=%eval(&I+1);
%end;

*Invoke CoxMacro for univariate analysis, use PRINT procedure and ODS to generate
the output file;
%if &Type=Uni %then %do;
%do Z=1 %to &I; %CoxMacro_Uni(%scan(&Predictor, &Z, %str( ))) %end;
ods pdf style=journal startpage=no file="&folder..\&Type.variate cox.pdf";
proc print data=dCoxAll split=' ' noobs Style(HEADER) = {font_weight=bold};
label Variable='Variable Name' Estimate='Regression Coefficient'
p_value='P Value' HRLowerCL='Lower HR' HRUpperCL='Upper HR'
STPHA_p='PH Assumption';
var Variable Label Estimate StdErr;
var p_value / style={foreground=pf.};
var HR HRLowerCL HRUpperCL Event Total;
var STPHA_p / style={foregroun f.}; d=p
format HR HRLowerCL HRUpperCL 3.1 Estimate StdErr p_value STPHA_p 5.3;
title1 "Univariate Cox Model(Propotional Hazard Model) on &Status.(1)";
title2 "Data folder=&folder";
title3 'Predictors are either continuous or coded as 0/1 categorical(ref=0)';
run;
ods pdf close;
%end;

*Invoke CoxMacro for multivariate analysis, use PRINT procedure and ODS to generate
the output file;
%if &Type=Multi %then %do;
%CoxMacro(&Predictor)
ods pdf style=journal startpage=no file="&folder..\&Type.variate cox.pdf";
proc print data=dCoxAll split=' ' noobs Style(HEADER) = {font_weight=bold};
label Variable='Variable Name' Estimate='Regression Coefficient'
p_value='P Value' HRLowerCL='Lower HR' HRUpperCL='Upper HR';
var Variable Label Estimate StdErr;
var p_value / style={foreground=pf.};
var HR HRLowerCL HRUpperCL Event Total;
format HR HRLowerCL HRUpperCL 3.1 Estimate StdErr p_value 5.3;
title1 "Multivariate Cox Model(Propotional Hazard Model) on &Status. (1)";
title2 "Data folder=&folder";
title3 'Predictors are either continuous or coded as 0/1 categorical(ref=0)';
run;
ods rtf close;
%end;
%mend sum_cox;

用法也是一樣:
%sum_cox(dat=nesug07, status=TVR_MACE, time=TVR_MACE_Days,
predictor=Male Age Hx_MI NumVes Hx_CAD Hx_CRI Hypertension LVEF, Type=Uni) ;

%sum_cox(dat=nesug07, status=TVR_MACE, time=TVR_MACE_Days,
predictor=Male Age Hx_MI NumVes Hx_CAD Hx_CRI Hypertension LVEF, Type=Multi) ;

結果也是一樣:


希望這些程式能夠對大家有所幫助。

CONTACT INFORMATION
Your comments and questions are valued and encouraged. If you want an electronic copy of the macros, please contact the author at:
Zhenyi Xue
Cardiovascular Research Institute
Washington Hospital Center
MedStar Health, Inc.
110 Irving Street NW, Suite 6D
Washington, DC 20010
Work Phone: 202-877-2987
Fax: 202-877-9337
Email: Zhenyi.Xue@MedStar.net

沒有留言:

張貼留言

要問問題的人請在文章下方的intensedebate欄位留言,請勿使用blogger預設的意見表單。今後用blogger意見表單留言的人我就不回應了。

CODE { display: block; /* fixes a strange ie margin bug */ font-family: Courier New; font-size: 8pt; overflow:auto; background: #f0f0f0 url(http://klcintw.images.googlepages.com/Code_BG.gif) left top repeat-y; border: 1px solid #ccc; padding: 10px 10px 10px 21px; max-height:200px; height:200px; // for IE6 line-height: 1.2em; }