一般在分析一個 binary 變數在兩個不同群體下的比例是否一致,我們會先繪製一個 2 x 2 表格,然後用卡方分析來檢定兩個比例。如果有 c 個群體,則卡方分析也可以檢定 2 x c 表格。然而,此時的虛無假設變成:
H0: P1=P2=...=Pc
如果拒絕了這個虛無假設,我們只能得到這個結論:至少有兩個比例互不相等。可是有哪些是不相等的,卡方分析並無法提供進一步的報表。於是, Alan C. Elliott 和 Joan S. Reisch 兩人在 2006 年的 SUGI 31 發表了一篇關於在卡方分析底下的多重比較方法,並提供 macro 程式給人使用。
在使用這個 macro 前,需要做一點小小準備。首先,要用 PROC FREQ 製作 2 x c 表格,並且將每個 cell 裡面的 frequency number 另存出來。如圖所示:

如果群組數目不是很多,可以另外新建一個資料,把這些數值(紅色圈選處)寫進去。當這個資料準備好後,立刻接上 Alan C. Elliott, Joan S. Reisch 所提供的 macro,就等於是大功告成了。以他們的程式為例:
data;
do a = 1 to 2;
do b = 1 to 4;
input wt @@;
output;
end;
datalines;
32 43 16 9
55 65 64 16
run;
%compprop(n=4);結果如下所示(不甚清楚,可詳見原文):

Macro 程式如下所示,可直接複製使用:
**************************************************************
* SAS® macro to perform multiple comparisons on proportions *
**************************************************************;
%macro compprop(n=);
proc freq;
weight wt;
tables a*b /chisq out=freqout outpct;
run;
*Select variables from data set freqout;
data propmc;set freqout;
if a=1;
* Apply an arcsine transformation to the proportions;
pi180=180/3.141;
n=(count*100)/pct_col;
p=0.5*(arsin(sqrt(count/(n+1)))*pi180 + arsin(sqrt((count+1)/(n+1)))*pi180);
proc sort data=propmc;by p;
proc transpose data=propmc out=propmc1;var n p b;
*Separate out pieces of the data and recombine into the needed data set;
data tmp;set propmc1;
if _N_=1;
%do i=1 %to &ncols;
n&i=col&i;
%end;
data tmp2;set propmc1;
if _N_=2;
data tmp3;set propmc1;
if _N_=3;
%do i=1 %to &ncols;
lab&i=col&i;
%end;
drop col1-col&ncols;
data propmc3;merge tmp tmp2 tmp3;
run;
* The array q05() contains the table lookup values for the q statistic with infinity
degrees of freedom for k groups (1 to 10);
data propmc2;set propmc3;
array p(*) col1-col&ncols;
array na(*) n1-n&ncols;
array lab(*) lab1-lab&ncols;
array q05(10);
q05(1)=.;
q05(2)=2.772;
q05(3)=3.314;
q05(4)=3.633;
q05(5)=3.585;
q05(6)=4.030;
Statistics SUGI 31 and Data Analysis
5
q05(7)=4.170;
q05(8)=4.286;
q05(9)=4.387;
q05(10)=4.474;
icompare=&n;
qcrit=q05(icompare);
* print out the multiple comparison results table;
msg=' ';
file print;
put ' Tukey Style Multiple Comparisons of Proportions';
put ' ';
put ' Compare Diff SE q q(.05) Conclude';
put ' ------------------------------------------------------------';
iskiprest=0;
* as the table is constructed, determine the correct Conclude message;
do i=icompare to 1 by -1;
do j=1 to i;
if i ne j then do;
se=round(sqrt((410.35/(na(i)+.5))+(410.35/(na(j)+.5))),.01);
diff=round(p(i)-p(j),.01);
q=round((p(i)-p(j))/se,.01);
iskip=0;
* Do not test if last one in series was Accept;
if msg='Accept' and j ne 1 then do
msg='Do not test';iskip=1;
put @5 lab(i) 'vs' @10 lab(j) @15 msg;
msg='Accept';
end;
* Do not test last one if previous one was Accept;
if iskip=0 and iskiprest=1 then do
msg='Do not test';iskip=1;
put @5 lab(i) 'vs' @10 lab(j) @15 msg;
end;
if iskip=0 then do;
if q gt qcrit then msg='Reject';else msg='Accept';
put @5 lab(i) 'vs' @10 lab(j) @ 15 diff @25 se @35 q @45 qcrit @55 msg ;
end;
if j=1 and (msg='Accept' or msg='Do no test')then iskiprest=1;
retain msg;
end;
end;
end;
put ' ';
put ' Reference: Biostatistical Analysis, Fourth Edition, Jerrold Zar, 1999, p
564.';
run;
%mend compprop;附帶一提的是,原文內有提到 c 的數目是從 2 到 10,但沒說超過 c 會發生什麼事情。所以在使用時若群組數目超過 10 的話,請特別留意任何錯誤訊息。
CONTACT INFORMATION
Alan C. Elliott
Center for Biostatistics and Clinical Science
UT Southwestern Medical Center Dallas
5323 Harry Hines Blvd
Dallas TX 75390-8822
Phone (214) 648-2712
Fax (214) 648-7673
alan.elliott@utsouthwestern.edu
 
 
沒有留言:
張貼留言
要問問題的人請在文章下方的intensedebate欄位留言,請勿使用blogger預設的意見表單。今後用blogger意見表單留言的人我就不回應了。