在作2乘2的卡方分析時,我們通常會看每一格裡面的樣本期望值是不是大於五,如果不大於五的比例超過一半的話,便需要使用 Fisher's exact test 的結果來取代原本的卡方分析結果。但這種判斷方法,在2乘2乘H的表格下並不適用。Mantel 和 Fleisś 在 1980 年代發明了一種判斷方法來專門處理這種情況,不過這個方法並沒有內建在 SAS 現存的程序內,因此 Brandon Welch, Jane Eslinger 和 Rob Woolson 在 2009 年的 SESUG 發表了一篇技術文件,提供一個簡便的 macro 供使用者使用。
有關 Mantel-Fleisś criterion 的理論,可詳見原文。以下就直接切入這個 macro 的使用方法:
%MantelFleiss(In,Strat,Var1,Var2,Count)
這個名為 MantelFleiss 的 macro 只有五個參數,定義如下:
- In:資料集名稱
- Strat: 層數,亦即2乘2表格的數量(H)
- Var1:行變數
- Var2:列變數
- Count:此變數是加權變數(weight),是非必要的變數。如果省略的話,程式會自動定義為「1」。
%macro MantelFleiss(In,Strat,Var1,Var2,Count);
%*-----------------------------------------------------------;
%* &In - input data set ;
%* &Strat - stratification variable (e.g., site, race, etc.);
%* &Var1 - row variable ;
%* &Var2 - column variable ;
%* &Count - optional weight variable ;
%*-----------------------------------------------------------;
%*reset output data sets;
PROC DATASETS nodetails nolist;
delete _counts _expect _mantel:;
RUN;
QUIT;
%*determine number of strata levels;
PROC SQL noprint;
select count(distinct &strat) into: stratflag from &In;
QUIT;
%put **Number of strata levels &stratflag;
%*get expected counts and totals in each partial table;
PROC FREQ data = &In;
tables &Strat*&Var1*&Var2 / expected outexpect out = _expect;
%if %nrbquote(&Count.) ne %then weight &Count.;;
ODS output CrossTabFreqs = _counts;
RUN;
%*get sum of expected cell counts in cell n11 in each partial table
(assign to macro var SUMEXP);
DATA _null_;
set _expect end = eof;
by &Strat;
retain sumexp 0;
if first.&Strat then do;
sumexp = sumexp + expected;
end;
if eof then call symput('sumexp',put(sumexp,8.4));
RUN;
%put Sum of expected values in cells nh11: sum(mh11) = %cmpres(&sumexp);
PROC SORT data = _counts;
by &Strat &Var1 &Var2;
RUN;
%*Subset to the needed totals output by PROC FREQ
number the &Strat, &Var1, and &Var2 for use below
0 - totals
1 - first &Strat/&Var1/&Var2
2 - second &Strat/&Var1/&Var2
...;
DATA _mantelf1;
%*only include marginal totals from ODS;
set _counts(where = (_type_ not in ('100' '111')));
by &Strat &Var1 &Var2;
retain &Strat._ 0 &Var1._ &Var2._;
%*reset for change in strata;
if first.&Strat then do;
&Var1._ = 0; &Var2._ = 0;
end;
%*get numeric version of strata variable;
if first.&Strat then &Strat._ = &Strat._ + 1;
%*define zero as total for any &Var1/&Var2;
if missing(&Var1) then &Var1._ = 0;
else &Var1._ + 1;
if missing(&Var2) then &Var2._ = 0;
else &Var2._ + 1;
keep &Strat._ &Var1._ &Var2._ &Strat. &Var1. &Var2. frequency;
RUN;
DATA _mantelf2;
set _mantelf1;
by &Strat._;
retain n11_ n1_1 n1_2;
if first.&Strat._ then do;
n11_ = 0;
n1_1 = 0;
n1_2 = 0;
end;
%*get each total;
if &Var1._ = 1 then n11_ = frequency; %*for row one in the hth strata;
if &Var2._ = 1 then n1_1 = frequency; %*for column one in the hth strata;
if &Var2._ = 2 then n1_2 = frequency; %*for column two in the hth strata;
%*define (nh11)L and (nh11)U for final computation;
max_ = max(0,n11_-n1_2);
min_ = min(n1_1,n11_);
if last.&Strat._;
RUN;
%*sum acoss min and max to get lower/upper bounds of criterion;
DATA _mantelf3(keep = mf_suml mf_sumu mflendpt mfrendpt mf_r);
set _mantelf2 end = eof;
retain mf_suml mf_sumu;
if _n_ = 1 then do;
mf_suml = 0; mf_sumu = 0;
end;
mf_suml = mf_suml + max_;
mf_sumu = mf_sumu + min_;
if eof then do;
mflendpt = &sumexp-mf_suml;
mfrendpt = mf_sumu-&sumexp;
mf_r = min(mflendpt,mfrendpt);
put "Sum of lower bound (sum[(nh11)L]) = " mf_suml;
put "Sum of upper bound (sum[(nh11)U]) = " mf_sumu;
put "Left end-point: sum[mh11] - sum[(nh11)L] = " mflendpt;
put "Right endpoint: sum[(nh11)U] - sum[mh11] = " mfrendpt;
put "Mantel-Fleiss R = min[" mflendpt "," mfrendpt "] = " mf_r;
if mf_r>5 then put "Mantel-Fleiss criterion satisfied (since R > 5)";
else put "Mantel-Fleiss criterion NOT satisfied (since R <= 5)";
output;
end;
label
mf_suml = "Summation of Maximum (sum[(nh11)L])"
mf_sumu = "Summation of Minumum (sum[(nh11)U])"
mflendpt = "Left End-Point: sum[mh11] - sum[(nh11)L]"
mfrendpt = "Right End-Point: sum[(nh11)U] - sum[mh11]"
mf_r = "R Value: min[sum[mh11] - sum[(nh11)L],sum[(nh11)U] - sum[mh11]]";
RUN;
%mend;
範例一:
資料檔:
PROC FORMAT;
value trtf
1 = 'TreatmentA'
2 = 'TreatmentB'
;
value sitef
1 = 'Arkansas'
2 = 'Indiana'
3 = 'Illinois'
;
value respf
1 = 'Present'
2 = 'Absent'
;
RUN;
DATA test;
do site = 1 to 3;
do treat = 1 to 2;
do resp = 1 to 2;
input counts @@;
output;
end;
end;
end;
label site = "Site" treat = "Treatment" resp = "Response";
format treat trtf. site sitef. resp respf.;
cards;
12 5 7 3
1 2 5 2
0 9 5 6
;
RUN;
上述程式在表格內呈現的結果如下:
接著執行 macro:
%MantelFleiss(test,site,treat,resp,counts) ;
要特別注意的一點是,這個 macro 的結果並不是呈現在 output 視窗裡面,而是直接呈現在 log 視窗裡面,所以送出這行程式碼後,便可以直接在 log 視窗上看到結果,如下所示:
Sum of expected values in cells nh11: sum(mh11) = 20.5130
Sum of lower bound (sum[(nh11)L]) = 9
Sum of upper bound (sum[(nh11)U]) = 25
Left end-point: sum[mh11] - sum[(nh11)L] = 11.513
Right endpoint: sum[(nh11)U] - sum[mh11] = 4.487
Mantel-Fleiss R = min[11.513 ,4.487 ] = 4.487
Mantel-Fleiss criterion NOT satisfied (since R <= 5)
原文作者很好心地幫使用者做出該資料是否有滿足 Mantel-Fleiss criterion 的判斷句,所以就不用管上面的數據所代表的意義了,但若在寫 paper 時,可能還是要把一些數據呈現出來,但其實這些數據並沒有解釋上的明顯意義。
這個例子並沒有滿足 Mantel-Fleiss criterion,也就是說我們得使用 Fisher's exact test 來取代原本的卡方檢定。
我們再來看另一個有滿足 Mantel-Fleiss criterion 的例子。
範例二:
我們直接來看資料的表格結構。
所有參數設定都和範例一同樣,只是數據改變而已。執行 macro 後的結果如下:
Sum of expected values in cells nh11: sum(mh11) = 19.3630
Sum of lower bound (sum[(nh11)L]) = 9
Sum of upper bound (sum[(nh11)U]) = 30
Left end-point: sum[mh11] - sum[(nh11)L] = 10.363
Right endpoint: sum[(nh11)U] - sum[mh11] = 10.637
Mantel-Fleiss R = min[10.363 ,10.637 ] = 10.363
Mantel-Fleiss criterion satisfied (since R > 5)
從最後一句話得知這份資料是有滿足 Mantel-Fleiss criterion 的,所以我們便可以直接使用卡方檢定的結果。
CONTACT INFORMATION
Brandon Welch
Rho®, Inc.
Statistical Programming
6330 Quadrangle Dr., Ste. 500
Chapel Hill, NC 27517
Email: Brandon_Welch@rhoworld.com
Phone: 919-595-6339
沒有留言:
張貼留言
要問問題的人請在文章下方的intensedebate欄位留言,請勿使用blogger預設的意見表單。今後用blogger意見表單留言的人我就不回應了。