Matrix Calculus
Go to: Introduction, Notation, Index
Contents of Calculus Section
Notation
- j is the square root of -1
- XR and XI are the real
and imaginary parts of X = XR +
jXI
- XC is the complex conjugate of X
- X: denotes the long column vector formed by concatenating the
columns of X
- A ¤ B = KRON(A,B), the kroneker product
- A • B the Hadamard
or elementwise product
- matrices and vectors A, B, C do not depend on
X
In the main part of this page we express results in terms of differentials
rather than derivatives for two reasons: they avoid notational disagreements and
they cope easily with the complex case. In most cases however, the differentials
have been written in the form dY: =
dY/dX dX: so that the corresponding
derivative may be easily extracted.
Derivatives with respect to a real matrix
If X is p#q and Y is m#n, then
dY: = dY/dX dX: where
the derivative dY/dX is a large mn#pq
matrix. If X and/or Y are column vectors or scalars, then the
vectorization operator : has no effect and may be omitted.
dY/dX is also called the Jacobian Matrix of
Y: with respect to X: and det(dY/dX)
is the corresponding Jacobian. The Jacobian occurs when changing
variables in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).
Although they do not generalise so well, other authors use alternative
notations for the cases when X and Y are both vectors or when one
is a scalar. In particular:
- dy/dx is sometimes transposed from the above
definition or else is sometimes written
dy/dxT to emphasise the
correspondence between the columns of the derivative and those of
xT.
- dY/dx and dy/dX
are often written as matrices rather than, as here, a column vector and row
vector respectively. The matrix form may be converted to the form used here by
appending : or :T respectively.
Derivatives with respect to a complex matrix
If X is complex then dY: =
dY/dX dX: can only be true iff
Y(X) is an analytic function which implies in particular that
Y(X) does not depend on XC or
XH.
Even for non-analytic functions we can write uniquely dY: =
dY/dX dX: +
dY/dXC
dXC: provided that is
analytic with respect to X and XC individually
(or equivlaently with respect to XR and
XI individually).
dY/dX is the Generalized Complex Derivative and
dY/dXC is the Complex Conjugate Derivative [R.3, R.8]. We have the following
relationships:
- dY: = dY/dX dX: +
dY/dXC
dXC:
- dY/dX = ½
(dY/dXR - j
dY/dXI)
- dY/dXC =
(dYC/dX)C
= ½ (dY/dXR + j
dY/dXI)
- If Y(X) is real for all complex X, then
dY/dXC=
(dY/dX)C
- Cauchy Riemann equations: The following are equivalent:
- Y(X) is an analytic function of X
- dY/dXC = 0 for all
X
- dY/dXR + j
dY/dXI = 0 for all
X
- dY/dXR =
dY/dX +
dY/dXC
- dY/dXI = j
(dY/dX -
dY/dXC)
- Chain rule: If Z is a function of Y which is itself a
function of X, then dZ/dX =
dZ/dY dY/dX. This is
the same as for real derivatives.
If
f(x) is a real function of a complex vector then
df/dxC= (df/dx)C
and we can define grad(f(x)) = 2
(df/dx)C
=df/dxR+j
df/dxI as the Complex Gradient
Vector [R.8] with the following
properties:
- grad(f(x)) is zero at an extreme value of f .
- grad(f(x)) points in the direction of steepest slope
of f(x)
- The magnitude of the steepest slope is equal to
|grad(f(x))|. Specifically, if g(x) =
grad(f(x)), then lima->0
a-1( f(x+ag(x)) -
f(x) ) = | g(x) |2
- grad(f(x)) is normal to the surface
f(x) = constant which means that it can be used for gradient
ascent/descent algorithms.
Basic Properties
- We may write the following differentials unambiguously without
parentheses:
- Transpose:
dYT=d(YT)=(dY)T
- Hermitian Transpose:
dYH=d(YH)=(dY)H
- Conjugate:
dYC=d(YC)=(dY)C
- Linearity: d(Y+Z)=dY+dZ
- Chain Rule: If Z is a
function of Y which is itself a function of X, then for both the
normal and the generalized complex derivative:
dZ: = dZ/dY dY: =
dZ/dY dY/dX
dX:
- Product Rule: d(YZ) =Y dZ +
dY Z
- d(YZ): = (I ¤ Y)
dZ: + (ZT ¤ I)
dY: = ((I ¤ Y)
dZ/dX + (ZT ¤ I)
dY/dX ) dX:
- Hadamard Product:
d(Y • Z) =Y • dZ + dY
• Z
- Kroneker Product: d(Y
¤ Z) =Y ¤ dZ + dY ¤ Z
Differentials of Linear
Functions
- d(Ax) =
d(xTA): =A
dx
- d (xHA): =
AT dxC
- d(ATXB): =
(AT dX B): = (B ¤
A)T dX:
- d(aTXb) = (b ¤
a)T dX: =
(abT):T dX:
- d(aTXa) =
d(aTXTa) =
(a ¤ a)T dX: =
(aaT):T
dX:
- d(aTXTb)
= (a ¤ b)T dX: =
(baT):T dX:
Differentials of Quadratic
Products
- d(Ax+b)TC(Dx+e)
= ((Ax+b)TCD +
(Dx+e)TCTA)
dx
- d(xTCx) =
xT(C+CT)dx
= [C=CT]
2xTCdx
- d(Ax+b)T (Dx+e) =
( (Ax+b)TD +
(Dx+e)TA)dx
- d(Ax+b)T (Ax+b) =
2(Ax+b)TAdx
- d(Ax+b)TC(Ax+b)
= [C=CT]
2(Ax+b)TCTA
dx
- d(Ax+b)HC(Dx+e)
= (Ax+b)HCD dx +
(Dx+e)TCTAC
dxC
- d (xHCx)
=xHC dx
+xTCT
dxC = [C=CH]
2(xHC dx)R
- d (xHx) =
2(xH dx)R
- d(aTXTXb)
= X(abT +
baT):T dX:
- d(aTXTXa)
= 2(XaaT ):T
dX:
- d(aTXTCXb)
= (CTXabT +
CXbaT):T dX:
- d(aTXTCXa)
= ((C + CT)XaaT
):T dX: = [C=CT]
2(CXaaT):T
dX:
- d((Xa+b)TC(Xa+b)) =
((C+CT)(Xa+b)aT
):T dX:
- d(X2): = (XdX + dX
X): = (I ¤ X + XT ¤ I)
dX:
- d(XTCX): =
(XTCdX): +
(d(XT) CX): = (I ¤
XTC) dX: +
(XTCT ¤ I)
dXT:
- d(XHCX): =
(XHCdX): +
(d(XH) CX): = (I ¤
XHC) dX: +
(XTCT ¤ I)
dXH:
Differentials of Cubic Products
- d(xxTAx) =
(xxT(A+AT)+xTAxI
)dx
Differentials of Inverses
- d(X-1) = -X-1dX
X-1 [2.1]
- d(X-1): =
-(X-T ¤ X-1)
dX:
- d(aTX-1b) =
-(X-TabTX-T ):T
dX: [2.6]
Differentials of Trace
Note: matrix dimensions must result in an n*n argument for tr().
- d(tr(Y))=tr(dY)
- d(tr(X)) = d(tr(XT)) =
I:T dX: [2.4]
- d(tr(Xk))
=k(Xk-1)T:T
dX:
- d(tr(AXk)) =
(SUMr=0:k-1(XrAXk-r-1)T
):T dX:
- d(tr(AX-1B)) =
-(X-1BAX-1)T:T
dX:=
-(X-TATBTX-T):T
dX: [2.5]
- d(tr(AX-1))
=d(tr(X-1A)) =
-(X-TATX-T
):T dX:
- d(tr(ATXBT)) =
d(tr(BXTA)) =
(AB):T
dX: [2.4]
- d(tr(XAT)) =
d(tr(ATX))
=d(tr(XTA)) =
d(tr(AXT)) =
A:T dX:
- d(tr(AXBXTC)) =
(ATCTXBT
+ CAXB):T dX:
- d(tr(XAXT)) =
d(tr(AXTX)) =
d(tr(XTXA)) =(
X(A+AT)):T
dX:
- d(tr(XTAX)) =
d(tr(AXXT)) =
d(tr(XXTA)) =
((A+AT)X):T
dX:
- d(tr(AXBX)) =
(ATXTBT
+
BTXTAT
):T dX:
- d(tr((AXb+c)(AXb+c)T)
=
2(AT(AXb+c)bT):T
dX:
- d(tr((XTCX)-1A) =
[C:symmetric] d(tr(A
(XTCX)-1) =
-((CX(XTCX)-1)(A+AT)(XTCX)-1):T
dX:
- d(tr((XTCX)-1(XTBX))
= [B,C:symmetric] d(tr(
(XTBX)(XTCX)-1)
=
2(BX(XTCX)-1-(CX(XTCX)-1)XTBX(XTCX)-1
):T dX:
Note: matrix dimensions must result in an n#n argument for
det(). Some of the expressions below involve inverses: these forms apply only if
the quantity being inverted is square and non-singular; alternative forms
involving the adjoint, ADJ(), do not have
the non-singular requirement.
- d(det(X)) =
d(det(XT)) = ADJ(XT):T
dX: = det(X)
(X-T):T
dX: [2.7]
- d(det(ATXB)) =
d(det(BTXTA))
= (A ADJ(ATXB)TBT):T
dX: = [A,B:
nonsingular] det(ATXB) ×
(X-T):T dX:
[2.8]
- d(ln(det(ATXB))) = [A,B: nonsingular]
(X-T):T dX:
[2.9]
- d(ln(det(X))) =
(X-T):T dX:
- d(det(Xk)) =
d(det(X)k) = k ×
det(Xk) ×
(X-T):T dX:
[2.10]
- d(ln(det(Xk))) = k ×
(X-T):T dX:
- d(det(XTCX)) = [C=CT]
2det(XTCX)×(CX(XTCX)-1):T
dX: [2.11]
- = [C=CT,
CX: nonsingular]
2det(XTCX)×(X-T):T
dX:
- d(ln(det(XTCX))) = [C=CT]
2(CX(XTCX)-1):T
dX:
- = [C=CT, CX:
nonsingular] 2(X-T):T
dX:
- d(det(XHCX)) =
det(XHCX)×
(CTXC
(XTCTXC)-1)dX:
+
(CX(XHCX)-1):T
dXC:) [2.12]
- d(ln(det(XHCX))) =
(CTXC
(XTCTXC)-1):TdX:
+
(CX(XHCX)-1):T
dXC: [2.13]
dY/dX is called the Jacobian Matrix
of Y: with respect to X: and
JX(Y)=det(dY/dX) is
the corresponding Jacobian. The Jacobian occurs when changing variables
in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).
- JX(X[n#n]-1)=
(-1)ndet(X)-2n
Hessian matrix
If f is a real function of x then the Hermitian matrix
Hx f= d/dx
(df/dx)H is the Hessian
matrix of f(x). A value of x for which grad
f(x) = 0 corresponds to a minimum, maximum or saddle point
according to whether Hx f is positive definite,
negative definite or indefinite.
- Hx (aTx) = 0
- Hx
(Ax+b)TC(Dx+e) =
ATCD +
DTCTA
- Hx (xTCx) =
C+CT
- Hx (Ax+b)T
(Dx+e) = ATD +
DTA
- Hx
(Ax+b)TC(Ax+b) = [C=CT]
2ATCA
Hx
(xHCx) = [C=CH] 2C
This page is part of The Matrix Reference
Manual. Copyright © 1998-2005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying
instructions. Please send any comments or suggestions to "mike.brookes" at
"imperial.ac.uk".
Updated: $Id: calculus.html,v 1.14 2005/08/17 10:42:09 dmb
Exp $