公告

[公告]
2014/01/17
由於已經是faculty的關係,不太有足夠時間寫部落格。因此更新的速度會相當緩慢。再加上近幾年來SAS GLOBAL FORUM沒有出現讓我覺得驚艷的技術文件,所以能分享的文章相對也減少許多。若有人推薦值得分享的SAS技術文件,請利用『問題討論區』告知。

2013/07/19
臉書留言板的功能因為有不明原因故障,因此特此移除。而intensedebate的留言板因管理不易,也一併移除。目前已經開啟內建的 G+ 留言系統,所以請有需要留言的朋友,可直接至『問題討論區』裡面留言。


2010年3月21日 星期日

A Macro for Computing the Mantel-Fleisś Criterion

原文載點:http://analytics.ncsu.edu/sesug/2009/PO008.Welch.pdf

在作2乘2的卡方分析時,我們通常會看每一格裡面的樣本期望值是不是大於五,如果不大於五的比例超過一半的話,便需要使用 Fisher's exact test 的結果來取代原本的卡方分析結果。但這種判斷方法,在2乘2乘H的表格下並不適用。Mantel 和 Fleisś 在 1980 年代發明了一種判斷方法來專門處理這種情況,不過這個方法並沒有內建在 SAS 現存的程序內,因此 Brandon Welch, Jane Eslinger 和 Rob Woolson 在 2009 年的 SESUG 發表了一篇技術文件,提供一個簡便的 macro 供使用者使用。

有關 Mantel-Fleisś criterion 的理論,可詳見原文。以下就直接切入這個 macro 的使用方法:
%MantelFleiss(In,Strat,Var1,Var2,Count)

這個名為 MantelFleiss 的 macro 只有五個參數,定義如下:
  • In:資料集名稱
  • Strat: 層數,亦即2乘2表格的數量(H)
  • Var1:行變數
  • Var2:列變數
  • Count:此變數是加權變數(weight),是非必要的變數。如果省略的話,程式會自動定義為「1」。
原始碼如下:
%macro MantelFleiss(In,Strat,Var1,Var2,Count);

%*-----------------------------------------------------------;
%* &In - input data set ;
%* &Strat - stratification variable (e.g., site, race, etc.);
%* &Var1 - row variable ;
%* &Var2 - column variable ;
%* &Count - optional weight variable ;
%*-----------------------------------------------------------;

%*reset output data sets;
PROC DATASETS nodetails nolist;
delete _counts _expect _mantel:;
RUN;
QUIT;

%*determine number of strata levels;
PROC SQL noprint;
select count(distinct &strat) into: stratflag from &In;
QUIT;
%put **Number of strata levels &stratflag;

%*get expected counts and totals in each partial table;
PROC FREQ data = &In;
tables &Strat*&Var1*&Var2 / expected outexpect out = _expect;
%if %nrbquote(&Count.) ne %then weight &Count.;;
ODS output CrossTabFreqs = _counts;
RUN;

%*get sum of expected cell counts in cell n11 in each partial table
(assign to macro var SUMEXP);
DATA _null_;
set _expect end = eof;
by &Strat;
retain sumexp 0;
if first.&Strat then do;
sumexp = sumexp + expected;
end;
if eof then call symput('sumexp',put(sumexp,8.4));
RUN;
%put Sum of expected values in cells nh11: sum(mh11) = %cmpres(&sumexp);

PROC SORT data = _counts;
by &Strat &Var1 &Var2;
RUN;


%*Subset to the needed totals output by PROC FREQ
number the &Strat, &Var1, and &Var2 for use below
0 - totals
1 - first &Strat/&Var1/&Var2
2 - second &Strat/&Var1/&Var2
...;

DATA _mantelf1;
%*only include marginal totals from ODS;
set _counts(where = (_type_ not in ('100' '111')));
by &Strat &Var1 &Var2;
retain &Strat._ 0 &Var1._ &Var2._;

%*reset for change in strata;
if first.&Strat then do;
&Var1._ = 0; &Var2._ = 0;
end;

%*get numeric version of strata variable;
if first.&Strat then &Strat._ = &Strat._ + 1;

%*define zero as total for any &Var1/&Var2;
if missing(&Var1) then &Var1._ = 0;
else &Var1._ + 1;

if missing(&Var2) then &Var2._ = 0;
else &Var2._ + 1;

keep &Strat._ &Var1._ &Var2._ &Strat. &Var1. &Var2. frequency;
RUN;

DATA _mantelf2;
set _mantelf1;
by &Strat._;

retain n11_ n1_1 n1_2;

if first.&Strat._ then do;
n11_ = 0;
n1_1 = 0;
n1_2 = 0;
end;

%*get each total;
if &Var1._ = 1 then n11_ = frequency; %*for row one in the hth strata;
if &Var2._ = 1 then n1_1 = frequency; %*for column one in the hth strata;
if &Var2._ = 2 then n1_2 = frequency; %*for column two in the hth strata;


%*define (nh11)L and (nh11)U for final computation;
max_ = max(0,n11_-n1_2);
min_ = min(n1_1,n11_);

if last.&Strat._;

RUN;

%*sum acoss min and max to get lower/upper bounds of criterion;
DATA _mantelf3(keep = mf_suml mf_sumu mflendpt mfrendpt mf_r);
set _mantelf2 end = eof;

retain mf_suml mf_sumu;

if _n_ = 1 then do;
mf_suml = 0; mf_sumu = 0;

end;

mf_suml = mf_suml + max_;
mf_sumu = mf_sumu + min_;

if eof then do;
mflendpt = &sumexp-mf_suml;
mfrendpt = mf_sumu-&sumexp;
mf_r = min(mflendpt,mfrendpt);

put "Sum of lower bound (sum[(nh11)L]) = " mf_suml;
put "Sum of upper bound (sum[(nh11)U]) = " mf_sumu;
put "Left end-point: sum[mh11] - sum[(nh11)L] = " mflendpt;
put "Right endpoint: sum[(nh11)U] - sum[mh11] = " mfrendpt;
put "Mantel-Fleiss R = min[" mflendpt "," mfrendpt "] = " mf_r;

if mf_r>5 then put "Mantel-Fleiss criterion satisfied (since R > 5)";
else put "Mantel-Fleiss criterion NOT satisfied (since R <= 5)";
output;
end;

label
mf_suml = "Summation of Maximum (sum[(nh11)L])"
mf_sumu = "Summation of Minumum (sum[(nh11)U])"
mflendpt = "Left End-Point: sum[mh11] - sum[(nh11)L]"
mfrendpt = "Right End-Point: sum[(nh11)U] - sum[mh11]"
mf_r = "R Value: min[sum[mh11] - sum[(nh11)L],sum[(nh11)U] - sum[mh11]]";
RUN;
%mend;


範例一:
資料檔:
PROC FORMAT;
value trtf
1 = 'TreatmentA'
2 = 'TreatmentB'
;

value sitef
1 = 'Arkansas'
2 = 'Indiana'
3 = 'Illinois'
;

value respf
1 = 'Present'
2 = 'Absent'
;
RUN;

DATA test;
do site = 1 to 3;
do treat = 1 to 2;
do resp = 1 to 2;
input counts @@;
output;
end;
end;
end;
label site = "Site" treat = "Treatment" resp = "Response";
format treat trtf. site sitef. resp respf.;
cards;
12 5 7 3
1 2 5 2
0 9 5 6
;
RUN;


上述程式在表格內呈現的結果如下:


接著執行 macro:
%MantelFleiss(test,site,treat,resp,counts) ;

要特別注意的一點是,這個 macro 的結果並不是呈現在 output 視窗裡面,而是直接呈現在 log 視窗裡面,所以送出這行程式碼後,便可以直接在 log 視窗上看到結果,如下所示:
Sum of expected values in cells nh11: sum(mh11) = 20.5130
Sum of lower bound (sum[(nh11)L]) = 9
Sum of upper bound (sum[(nh11)U]) = 25
Left end-point: sum[mh11] - sum[(nh11)L] = 11.513
Right endpoint: sum[(nh11)U] - sum[mh11] = 4.487
Mantel-Fleiss R = min[11.513 ,4.487 ] = 4.487
Mantel-Fleiss criterion NOT satisfied (since R <= 5)


原文作者很好心地幫使用者做出該資料是否有滿足 Mantel-Fleiss criterion 的判斷句,所以就不用管上面的數據所代表的意義了,但若在寫 paper 時,可能還是要把一些數據呈現出來,但其實這些數據並沒有解釋上的明顯意義。

這個例子並沒有滿足 Mantel-Fleiss criterion,也就是說我們得使用 Fisher's exact test 來取代原本的卡方檢定。

我們再來看另一個有滿足 Mantel-Fleiss criterion 的例子。

範例二:

我們直接來看資料的表格結構。

所有參數設定都和範例一同樣,只是數據改變而已。執行 macro 後的結果如下:
Sum of expected values in cells nh11: sum(mh11) = 19.3630
Sum of lower bound (sum[(nh11)L]) = 9
Sum of upper bound (sum[(nh11)U]) = 30
Left end-point: sum[mh11] - sum[(nh11)L] = 10.363
Right endpoint: sum[(nh11)U] - sum[mh11] = 10.637
Mantel-Fleiss R = min[10.363 ,10.637 ] = 10.363
Mantel-Fleiss criterion satisfied (since R > 5)


從最後一句話得知這份資料是有滿足 Mantel-Fleiss criterion 的,所以我們便可以直接使用卡方檢定的結果。

CONTACT INFORMATION
Brandon Welch
Rho®, Inc.
Statistical Programming
6330 Quadrangle Dr., Ste. 500
Chapel Hill, NC 27517
Email: Brandon_Welch@rhoworld.com
Phone: 919-595-6339

沒有留言:

張貼留言

要問問題的人請在文章下方的intensedebate欄位留言,請勿使用blogger預設的意見表單。今後用blogger意見表單留言的人我就不回應了。

CODE { display: block; /* fixes a strange ie margin bug */ font-family: Courier New; font-size: 8pt; overflow:auto; background: #f0f0f0 url(http://klcintw.images.googlepages.com/Code_BG.gif) left top repeat-y; border: 1px solid #ccc; padding: 10px 10px 10px 21px; max-height:200px; height:200px; // for IE6 line-height: 1.2em; }