1

المرجع الالكتروني للمعلوماتية

تاريخ الرياضيات

الاعداد و نظريتها

تاريخ التحليل

تار يخ الجبر

الهندسة و التبلوجي

الرياضيات في الحضارات المختلفة

العربية

اليونانية

البابلية

الصينية

المايا

المصرية

الهندية

الرياضيات المتقطعة

المنطق

اسس الرياضيات

فلسفة الرياضيات

مواضيع عامة في المنطق

الجبر

الجبر الخطي

الجبر المجرد

الجبر البولياني

مواضيع عامة في الجبر

الضبابية

نظرية المجموعات

نظرية الزمر

نظرية الحلقات والحقول

نظرية الاعداد

نظرية الفئات

حساب المتجهات

المتتاليات-المتسلسلات

المصفوفات و نظريتها

المثلثات

الهندسة

الهندسة المستوية

الهندسة غير المستوية

مواضيع عامة في الهندسة

التفاضل و التكامل

المعادلات التفاضلية و التكاملية

معادلات تفاضلية

معادلات تكاملية

مواضيع عامة في المعادلات

التحليل

التحليل العددي

التحليل العقدي

التحليل الدالي

مواضيع عامة في التحليل

التحليل الحقيقي

التبلوجيا

نظرية الالعاب

الاحتمالات و الاحصاء

نظرية التحكم

بحوث العمليات

نظرية الكم

الشفرات

الرياضيات التطبيقية

نظريات ومبرهنات

علماء الرياضيات

500AD

500-1499

1000to1499

1500to1599

1600to1649

1650to1699

1700to1749

1750to1779

1780to1799

1800to1819

1820to1829

1830to1839

1840to1849

1850to1859

1860to1864

1865to1869

1870to1874

1875to1879

1880to1884

1885to1889

1890to1894

1895to1899

1900to1904

1905to1909

1910to1914

1915to1919

1920to1924

1925to1929

1930to1939

1940to the present

علماء الرياضيات

الرياضيات في العلوم الاخرى

بحوث و اطاريح جامعية

هل تعلم

طرائق التدريس

الرياضيات العامة

نظرية البيان

الرياضيات : الاحتمالات و الاحصاء :

Correlation Coefficient

المؤلف:  Acton, F. S.

المصدر:  Analysis of Straight-Line Data. New York: Dover, 1966.

الجزء والصفحة:  ...

27-3-2021

3881

Correlation Coefficient

The correlation coefficient, sometimes also called the cross-correlation coefficient, Pearson correlation coefficient (PCC), Pearson's r, the Perason product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a quantity that gives the quality of a least squares fitting to the original data. To define the correlation coefficient, first consider the sum of squared values ss_(xx)ss_(xy), and ss_(yy) of a set of n data points (x_i,y_i) about their respective means,

ss_(xx) = sum(x_i-x^_)^2

(1)

= sumx^2-2x^_sumx+sumx^_^2

(2)

= sumx^2-2nx^_^2+nx^_^2

(3)

= sumx^2-nx^_^2

(4)

ss_(yy) = sum(y_i-y^_)^2

(5)

= sumy^2-2y^_sumy+sumy^_^2

(6)

= sumy^2-2ny^_^2+ny^_^2

(7)

= sumy^2-ny^_^2

(8)

ss_(xy) = sum(x_i-x^_)(y_i-y^_)

(9)

= sum(x_iy_i-x^_y_i-x_iy^_+x^_y^_)

(10)

= sumxy-nx^_y^_-nx^_y^_+nx^_y^_

(11)

= sumxy-nx^_y^_.

(12)

These quantities are simply unnormalized forms of the variances and covariance of X and Y given by

ss_(xx) = Nvar(X)

(13)

ss_(yy) = Nvar(Y)

(14)

ss_(xy) = Ncov(X,Y).

(15)

For linear least squares fitting, the coefficient b in

 y=a+bx

(16)

is given by

b =

(17)

= (ss_(xy))/(ss_(xx)),

(18)

and the coefficient  in

(19)

is given by

(20)

CorrelationCoefficient

The correlation coefficient r (sometimes also denoted R) is then defined by

r^2 =

(21)

= (ss_(xy)^2)/(ss_(xx)ss_(yy)).

(22)

The correlation coefficient is also known as the product-moment coefficient of correlation or Pearson's correlation. The correlation coefficients for linear fits to increasingly noisy data are shown above.

The correlation coefficient has an important physical interpretation. To see this, define

 A=[sumx^2-nx^_^2]^(-1)

(23)

and denote the "expected" value for y_i as y^^_i. Sums of y^^_i are then

= a+bx_i

(24)

= y^_-bx^_+bx_i

(25)

= y^_+b(x_i-x^_)

(26)

= A(y^_sumx^2-x^_sumxy+x_isumxy-nx^_y^_x_i)

(27)

= A[y^_sumx^2+(x_i-x^_)sumxy-nx^_y^_x_i]

(28)

sumy^^_i = A(ny^_sumx^2-n^2x^_^2y^_)

(29)

sumy^^_i^2 = A^2[ny^_^2(sumx^2)^2-n^2x^_^2y^_^2(sumx^2)-2nx^_y^_(sumxy)(sumx^2)+2n^2x^_^3y^_(sumxy)+(sumx^2)(sumxy)^2-nx^_^2(sumxy)]

(30)

sumy_iy^^_i = Asum[y_iy^_sumx^2+y_i(x_i-x^_)sumxy-nx^_y^_x_iy_i]

(31)

= A[ny^_^2sumx^2+(sumxy)^2-nx^_y^_sumxy-nx^_y^_(sumxy)]

(32)

= A[ny^_^2sumx^2+(sumxy)^2-2nx^_y^_sumxy].

(33)

The sum of squared errors is then

SSE = sum(y^^_i-y^_)^2

(34)

= sum(y^^_i^2-2y^_y^^_i+y^_^2)

(35)

= A^2(sumxy-nx^_y^_)^2(sumx^2-nx^_^2)

(36)

= ((sumxy-nx^_y^_)^2)/(sumx^2-nx^_^2)

(37)

= bss_(xy)

(38)

= (ss_(xy)^2)/(ss_(xx))

(39)

= ss_(yy)r^2

(40)

= b^2ss_(xx),

(41)

and the sum of squared residuals is

SSR = sum(y_i-y^^_i)^2

(42)

= sum(y_i-y^_+bx^_-bx_i)^2

(43)

= sum[y_i-y^_-b(x_i-x^_)]^2

(44)

= sum(y_i-y^_)^2+b^2sum(x_i-x^_)^2-2bsum(x_i-x^_)(y_i-y^_)

(45)

= ss_(yy)+b^2ss_(xx)-2bss_(xy).

(46)

But

b = (ss_(xy))/(ss_(xx))

(47)

r^2 = (ss_(xy)^2)/(ss_(xx)ss_(yy)),

(48)

so

SSR = ss_(yy)+(ss_(xy)^2)/(ss_(xx)^2)ss_(xx)-2(ss_(xy))/(ss_(xx))ss_(xy)

(49)

= ss_(yy)-(ss_(xy)^2)/(ss_(xx))

(50)

= ss_(yy)(1-(ss_(xy)^2)/(ss_(xx)ss_(yy)))

(51)

= ss_(yy)(1-r^2),

(52)

and

 SSE+SSR=ss_(yy)(1-r^2)+ss_(yy)r^2=ss_(yy).

(53)

The square of the correlation coefficient r^2 is therefore given by

r^2 = (SSR)/(ss_(yy))

(54)

= (ss_(xy)^2)/(ss_(xx)ss_(yy))

(55)

= ((sumxy-nx^_y^_)^2)/((sumx^2-nx^_^2)(sumy^2-ny^_^2)).

(56)

In other words, r^2 is the proportion of ss_(yy) which is accounted for by the regression.

If there is complete correlation, then the lines obtained by solving for best-fit (a,b) and  coincide (since all data points lie on them), so solving (◇) for y and equating to (◇) gives

(57)

Therefore,  and , giving

(58)

The correlation coefficient is independent of both origin and scale, so

 r(u,v)=r(x,y),

(59)

where

u = (x-x_0)/h

(60)

v = (y-y_0)/h.

(61)


REFERENCES:

Acton, F. S. Analysis of Straight-Line Data. New York: Dover, 1966.

Edwards, A. L. "The Correlation Coefficient." Ch. 4 in An Introduction to Linear Regression and Correlation. San Francisco, CA: W. H. Freeman, pp. 33-46, 1976.

Gonick, L. and Smith, W. "Regression." Ch. 11 in The Cartoon Guide to Statistics. New York: Harper Perennial, pp. 187-210, 1993.

Kenney, J. F. and Keeping, E. S. "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285, 1962.

Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. "Linear Correlation." §14.5 in Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England: Cambridge University Press, pp. 630-633, 1992.

Snedecor, G. W. and Cochran, W. G. "The Sample Correlation Coefficient r" and "Properties of r." §10.1-10.2 in Statistical Methods, 7th ed. Ames, IA: Iowa State Press, pp. 175-178, 1980.

Spiegel, M. R. "Correlation Theory." Ch. 14 in Theory and Problems of Probability and Statistics, 2nd ed. New York: McGraw-Hill, pp. 294-323, 1992.

Whittaker, E. T. and Robinson, G. "The Coefficient of Correlation for Frequency Distributions which are not Normal." §166 in The Calculus of Observations: A Treatise on Numerical Mathematics, 4th ed. New York: Dover, pp. 334-336, 1967.

EN

تصفح الموقع بالشكل العمودي