/****** Numerically Improved Descriptive Stats *****/ Author: H. D. Vinod vinod@murray.fordham.edu Professor of Economics Fordham University, Bronx, NY 10458 Web page: http://www.fordham.edu/economics/vinod Date: 12 July 1999 This code is provided without performance guarantees for private, non-commercial use. Theorists have known at least since R. F. Ling's 1974 JASA article p.859 that some revisions to the usual formulas for mean and standard deviation will yield improved algorithms. See McCullough and Vinod, J of Econ. Lit. June 1999, p.649 for a discussion of related issues. meanc2.g is a GAUSS proc to yield improved mean, stdc2 and stdc3 procs will yield improved algorithms for the standard dev. meanc2.tes is a main program for testing all procs. If meanc2 is used, GAUSS software is able to compute the exactly correct mean of a test problem called Numerical-Accuracy-4 NIST (1999) National Institute of Standards and Technology, US Dept. of Commerce, Technology Administration, Gaithersburg, MD 20899-001, Statistical Reference Data Sets (StRD). Download at: www.itl.nist.gov/div898/strd/general/dataarchive.html Dataset Name: Numerical-Accuracy-4 Description: This is a constructed/fabricated data set to test accuracy in summary statistic calculations. It consists of 1001 9-digit floating- point values: a single 10000000.2, followed by 500 pairings of 10000000.1 and 10000000.3. The numbers are 9-digit floating point values and differ only in the last decimal place. sample mean = 10000000.2 (exact) sample standard dev. = 0.1 (exact) sample autocorr. coef. = -0.999 (exact) Stat Category: Univariate Reference: Simon, Stephen D. and Lesage, James P. (1989). Assessing the Accuracy of ANOVA Caluclations in Statistical Software", Computational Statistics & data Analysis, 8, pp. 325-332. GAUSS: meanc algorithm yields sample mean =10000000.20000010 Stdc alg. yields sample standard dev.=0.1000000005587935 sample autocorr. coef.=-0.9989999990108854 Our meanc2 yields the exact sample mean=10000000.20000000 stdc3 alg. yields sample standard dev. =0.1000000005587934 sample autocorr. coef.=-0.9989999999999998 Conclusion: Our meanc2 does yield the exact result. The stdc3 algorithm also makes a small improvement. More mathematical research is needed to further improve the algorithms for standard deviation. It may be necessary to improve the GAUSS language itself. ***********************************************************/ @meanc2.tes: a test program for the procedures below@ new; a=99^101; b=seqa(1,1,100); bbar=meanc(b); c=a*b; meanc(c); meanc2(c); bbar*a; x=rndn(1000,1); "means" meanc(x)~meanc2(x); "std dev " stdc(x)~stdc2(x)~stdc3(x); @meanc2.g@ proc (1)=meanc2(x); @ Compute numerically mor accurate column means author H. D. Vinod, March 9, 1999 References: R. F. Ling, JASA, vol 69, p.859, 1974. McCullough -Vinod, J of Econ Literature, Jun 1999, 633-665. Input: x = any matrix. Output: vector of means. @ Local avg, n, p, ane; n=rows(x);p=cols(x); ane=ones(n,1); avg=meanc(x); retp(avg+ (sumc(x-ane*avg'))/n); endp; @ssd.g@ proc (1)=ssd(x); @Compute numerically more accurate sum of squared deviations(ssd) from the mean author H. D. Vinod, March 9, 1999 References: R. F. Ling, JASA, vol 69, p.859, 1974. McCullough -Vinod, J of Econ Literature, 1999, 633-665. Input: x = any matrix. Output: vector of ssd's. @ Local crud,avg, n, p, ane; n=rows(x);p=cols(x); ane=ones(n,1); avg=meanc(x); @Note this version uses original column mean in GAUSS@ @"avg=" avg;@ crud=sumc((x-ane*avg')^2); @"crud" crud; @ retp(crud- ((sumc(x-ane*avg'))^2)/n); endp; @ssd2.g@ proc (1)=ssd2(x); @Compute numerically more accurate sum of squared deviations(ssd) from the mean ssd2 is version 2 where meanc2 is called instead of meanc author H. D. Vinod, March 9, 1999 References: R. F. Ling, JASA, vol 69, p.859 McCullough -Vinod, J of Econ Literature 633-665. Input: x = any matrix. Output: vector of ssd's. @ Local crud,avg, n, p, ane; n=rows(x);p=cols(x); ane=ones(n,1); avg=meanc2(x); @NOTE here the column mean is revised@ @"avg=" avg;@ crud=sumc((x-ane*avg')^2); @"crud" crud; @ retp(crud- ((sumc(x-ane*avg'))^2)/n); endp; @stdc2.g@ proc (1)=stdc2(x); @Compute numerically more accurate column standard deviation author H. D. Vinod, March 9, 1999 References: R. F. Ling, JASA, vol 69, p.859, 1974. McCullough -Vinod, J of Econ Literature, Jun 1999 633-665. Input: x = any matrix. Output: vector of sample standard deviations. @ Local crud,avg, n, p, ane, ssd; n=rows(x);p=cols(x); ane=ones(n,1); avg=meanc(x); @Note this version uses original column mean in GAUSS@ @"avg=" avg;@ crud=sumc((x-ane*avg')^2); @"crud" crud; @ ssd=crud- ((sumc(x-ane*avg'))^2)/n; retp(sqrt(ssd/(n-1))); endp; @stdc3.g@ proc (1)=stdc3(x); @Compute numerically more accurate column standard deviation author H. D. Vinod, March 9, 1999 References: R. F. Ling, JASA, vol 69, p.859, 1974. McCullough -Vinod, J of Econ Literature, Jun 1999 633-665. Input: x = any matrix. Output: vector of sample standard deviations. @ Local crud,avg, n, p, ane, ssd; n=rows(x);p=cols(x); ane=ones(n,1); avg=meanc2(x); @Note this version uses revised column mean proc by vinod@ @"avg=" avg;@ crud=sumc((x-ane*avg')^2); @"crud" crud; @ ssd=crud- ((sumc(x-ane*avg'))^2)/n; retp(sqrt(ssd/(n-1))); endp;