ROUGH NOTES (!)
Updated: 12/7/24
Second derivative; Third derivative; Higher derivatives; Taylor’s theorem; Inverse function theorem; Implicit function theorem
Back to top.
\[{ \underline{\textbf{Second derivative}} }\]Def [${ C ^2 }$ maps]: Let ${ E, F }$ be complete normed spaces, and ${ f : U (\subseteq E \text{ open}) \to F .}$
We say ${ f }$ is ${ C ^2 }$ if it is ${ C ^1 }$ and the derivative ${ Df : U \to L(E, F) }$ is ${ C ^1 , }$ that is if the derivatives
exist and are continuous.
The codomain ${ L (E, L(E, F)) }$ above is infact the space of continuous bilinear maps ${ L ^2 (E; F) .}$
Obs [${ L(E, L(E, F)) \cong L ^2 (E; F) }$ as normed spaces]:
Let ${ E, F }$ be complete normed spaces. Recall ${ L ^k (E; F) }$ is the space of continuous multilinear maps ${ \underbrace{E \times \ldots \times E} _{k \text{ many}} \to F . }$ Now
is an isomorphism of normed spaces.
Here ${ \lambda }$ in ${ L(E, L(E, F)) }$ takes inputs as ${ \lambda (\underline{\,}) (\underline{\,}) ,}$ and ${ \Phi (\lambda) }$ in ${ L ^2 (E; F) }$ takes inputs as ${ \Phi (\lambda) (\underline{\,}, \underline{\,}) .}$
\[{ \begin{align*} &\, L(E, L(E, F)) \overset{\cong}{\longleftrightarrow} L ^2 (E; F) , \\ &\, \quad \lambda (\_) (\_) \quad \longleftrightarrow \quad \lambda (\_ \, , \, \_) . \end{align*} }\]
Writing (informally) ${ \Phi (\lambda) }$ as ${ \lambda (\underline{\,} , \underline{\,}) ,}$ the normed space isomorphism is
Pf: Firstly, if ${ \lambda \in L(E, L(E, F)) }$ then
\[{ \Phi(\lambda) : (x _1, x _2) \mapsto \lambda (x _1) (x _2) }\]is in ${ L ^2 (E; F) }$: Bilinearity of ${ \Phi (\lambda) }$ is clear, and continuity of ${ \Phi(\lambda) }$ is because
\[{ \begin{align*} \lVert \lambda (x _1) (x _2) \rVert \leq &\, \lVert \lambda (x _1) \rVert \lVert x _2 \rVert \quad (\text{as } \lambda(x _1) \in L(E, F)) \\ \leq &\, \lVert \lambda \rVert \lVert x _1 \rVert \lVert x _2 \rVert \quad (\text{as } \lambda \in L(E, L(E, F))). \end{align*} }\]So ${ \Phi }$ is well-defined.
Linearity of ${ \Phi }$ is clear.
The above inequality gives
\[{ \lVert \Phi (\lambda) \rVert \leq \lVert \lambda \rVert , }\]but even
\[{ \begin{align*} \lVert \lambda \rVert = &\, \sup _{x _1 \neq 0} \frac{\lVert \lambda (x _1) \rVert _{L(E, F)} }{\lVert x _1 \rVert} \\ = &\, \sup _{x _1\neq 0} \left( \sup _{x _2 \neq 0} \frac{\lVert \lambda(x _1)(x _2)\rVert}{\lVert x _1 \rVert \lVert x _2 \rVert} \right) \\ \leq &\, \sup _{x _1 \neq 0} \lVert \Phi (\lambda) \rVert = \lVert \Phi (\lambda) \rVert , \end{align*} }\]so
\[{ \lVert \Phi (\lambda) \rVert = \lVert \lambda \rVert }\]that is ${ \Phi }$ preserves norms.
It is left to show that ${ \Phi }$ is a bijection.
Surjectivity of ${ \Phi }$: Say ${ f \in L ^2 (E; F) . }$ We want an ${ f _0 \in L(E, L(E, F)) }$ such that ${ \Phi (f _0) = f .}$
Informally, defining ${ f _0 := f(\underline{\,}) (\underline{\,}) }$ works. Formally, consider
Each ${ f _0 (x _1 ) }$ is in ${ L(E, F) , }$ because linearity of ${ f _0 (x _1) : E \to F }$ is clear and
\[{ \sup _{x _2\neq 0} \frac{\lVert f _0 (x _1) (x _2) \rVert}{\lVert x _2 \rVert} = \sup _{x _2 \neq 0} \frac{\lVert f(x _1, x _2) \rVert}{\lVert x _2 \rVert} \leq \lVert f \rVert \lVert x _1 \rVert < \infty . }\]Now ${ f _0 \in L(E, L(E, F)) , }$ because linearity of ${ f _0 }$ is clear and
\[{ \begin{align*} \sup _{x _1 \neq 0} \frac{\lVert f _0 (x _1) \rVert _{L(E, F)}}{\lVert x _1 \rVert} = &\, \sup _{x _1 \neq 0} \left( \sup _{x _2 \neq 0} \frac{\lVert f _0 (x _1) (x _2) \rVert }{\lVert x _1 \rVert \lVert x _2 \rVert} \right) \\ \leq &\sup _{x _1 \neq 0} \lVert f \rVert = \lVert f \rVert < \infty . \end{align*} }\]Finally ${ \Phi(f _0) = f }$ because
\[{ \begin{align*} \Phi(f _0) \, (x _1, x _2) = &\, f _0 (x _1) (x _2) \\ = &\, f \, (x _1, x _2) \end{align*} }\]for all ${ x _1, x _2 \in E . }$
Injectivity of ${ \Phi }$: Say ${ \Phi(\lambda _1) = \Phi (\lambda _2) }$ for some ${ \lambda _1, \lambda _2 \in L(E, L(E, F) ) .}$ Now
\[{ \begin{align*} \lVert \lambda _1 - \lambda _2 \rVert = &\, \lVert \Phi (\lambda _1 - \lambda _2) \rVert \\ = &\, \lVert \Phi(\lambda _1) - \Phi (\lambda _2) \rVert \\ = &\, 0, \end{align*} }\]so ${ \lambda _1 = \lambda _2 . }$
Therefore ${ \Phi }$ is an isomorphism of normed spaces, as needed. ${ \blacksquare }$
Obs: Any bilinear map
\[{ \lambda : \mathbb{R} ^n \times \mathbb{R} ^n \to \mathbb{R} ^m , \quad \lambda = (\lambda _1, \ldots, \lambda _m) ^T }\]is continuous because each component
\[{ \lambda _i (x _1, x _2) = \sum _{j _1, j _2 \in [n]} (x _1) _{j _1} (x _2) _{j _2} \lambda _i (e _{j _1}, e _{j _2}) }\]is continuous.
Obs [${ C ^2 }$ maps ${ \mathbb{R} ^n \to \mathbb{R} ^m }$]:
Consider ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m .}$
From the previous result on ${ C ^1 }$ maps ${ \mathbb{R} ^n \to \mathbb{R} ^m ,}$
Thm [${ D ^2 f }$ for ${ C ^2 }$ maps ${\mathbb{R} ^n \to \mathbb{R} ^m }$]:
Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m }$ be a ${ C ^2 }$ map and ${ x \in U . }$ Then the second derivative
is given by its action on basis vectors
\[{ D ^2 f (x) : \mathbb{R} ^n \times \mathbb{R} ^n \to \mathbb{R} ^m \text{ bilinear,} }\] \[{ \boxed{\begin{align*} D ^2 f (x) \quad \underbrace{(e _{j _1})} _{\text{in } \mathbb{R} ^n} \quad \underbrace{(e _{j _2})} _{\text{in } \mathbb{R} ^n} = &\, \underbrace{\begin{pmatrix} D _{j _1} D _{j _2} f _1 (x) \\ \vdots \\ D _{j _1} D _{j _2} f _m (x) \end{pmatrix}} _{\text{in } \mathbb{R} ^m} \end{align*} } }\]that is
\[{ D ^2 f (x) (x _1) (x _2) = \sum _{j _1, j _2 \in [n]} (x _1) _{j _1} (x _2) _{j _2} \begin{pmatrix} D _{j _1} D _{j _2} f _1 (x) \\ \vdots \\ D _{j _1} D _{j _2} f _m (x) \end{pmatrix} . }\]For ${ m = 1 , }$ this second derivative looks like
\[{ D ^2 f (p) : \mathbb{R} ^n \times \mathbb{R} ^n \to \mathbb{R} \text{ bilinear,} }\] \[{ \begin{align*} D ^2 f (p) \, (x) (y) = &\, \sum _{i, j} x _i y _j D ^2 f (p) (e _i) (e _j) \\ = &\, \sum _{i, j} x _i y _j D _i D _j f(p) \\ = &\, x ^T \begin{pmatrix} D _1 D _1 f(p) &\cdots &D _1 D _n f(p) \\ \vdots &\ddots &\vdots \\ D _n D _1 f(p) &\cdots &D _n D _n f(p) \end{pmatrix} y \\ = &\, x ^T H f (p) y \end{align*} }\]and ${ H f (p) = [D _i D _j f(p)] }$ is called the Hessian.
Pf: Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m }$ be ${ C ^2 }$ and ${ x \in U . }$ By the definition of derivative, for ${ h }$ in a neighbourhood of ${ 0, }$
\[{ \begin{align*} &Df(x+h) = Df(x) + D ^2 f(x) \, h + \lVert h \rVert \varphi (h) \text{ in } L(\mathbb{R} ^n, \mathbb{R} ^m) \\ &\, \text{with } \varphi(0) = 0 \text{ and } \varphi (h) \text{ continuous at } 0 . \end{align*} }\]So (for ${ t _1 }$ in a neighbourhood of ${ 0 }$)
\[{ \begin{align*} Df(x + t _1 e _{j _1}) \, e _{j _2} = &\, Df (x) \, e _{j _2} + D ^2 f (x) (t _1 e _{j _1}) (e _{j _2}) \\ &\, + \lVert t _1 e _{j _1} \rVert \varphi (t _1 e _{j _1}) \, e _{j _2} \end{align*} }\]that is
\[{ \begin{align*} \begin{pmatrix} D _{j _2} f _1 (x + t _1 e _{j _1}) \\ \vdots \\ D _{j _2} f _m (x + t _1 e _{j _1}) \end{pmatrix} = &\, \begin{pmatrix} D _{j _2} f _1 (x) \\ \vdots \\ D _{j _2} f _m (x) \end{pmatrix} + t _1 D ^2 f (x) (e _{j _1}) (e _{j _2}) \\ &\, + \vert t _1 \vert \varphi (t _1 e _{j _1}) \, e _{j _2} . \end{align*} }\]Dividing by ${ t _1 \neq 0 }$ and letting ${ t _1 \to 0 }$ gives
\[{ D ^2 f (x) (e _{j _1}) (e _{j _2}) = \begin{pmatrix} D _{j _1} D _{j _2} f _1 (x) \\ \vdots \\ D _{j _1} D _{j _2} f _m (x) \end{pmatrix} }\]as needed. ${ \blacksquare }$
Thm [Second derivatives are symmetric]:
Let ${ E, F }$ be complete normed spaces and ${ f : U (\subseteq E \text{ open}) \to F }$ a ${ C ^2 }$ map.
Then for every ${ x \in U, }$ the bilinear map ${ D ^2 f(x) }$ is symmetric that is
So especially if ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} }$ is ${ C ^2 ,}$ its Hessian at every point is symmetric.
Pf: Let ${ x \in U . }$ There is an ${ r > 0 }$ such that ${ B(x, r) \subseteq U .}$
Pick any ${ v, w \in E }$ with lengths ${ \lVert v \rVert, \lVert w \rVert < \frac{r}{2} .}$ The length constraint is so that the points ${ x, x + v, x + w, x + v+w }$ are all in the ball.
As ${ D ^2 f (x) }$ scales as ${ D ^2 f (x) (\lambda v) (\mu w) = \lambda \mu D ^2 f (x) (v) (w) ,}$ it suffices to show
\[{ \text{To show: } D ^2 f (x) (v) (w) = D ^2 f (x) (w) (v) . }\]Heuristic: We have two approximations
\[{ \begin{align*} f(x + v + w) \approx &\, f(x + v) + Df(x+v) \, w \\ \approx &\, f(x) + Df(x) \, v \\ &\, + [ Df(x) + D ^2 f(x) (v) ] w \end{align*} }\]and
\[{ \begin{align*} f(x + v + w) \approx &\, f(x + w) + Df(x+w) \, v \\ \approx &\, f(x) + Df(x) \, w \\ &\, + [ Df(x) + D ^2 f(x) (w) ] v . \end{align*} }\]Comparing both, one expects
\[{ D ^2 f (x) (v) (w) = D ^2 f (x) (w) (v) }\]to hold.
Consider the difference maps
\[{ G _v, G _w : B (x, r/2) \longrightarrow F, }\] \[{ G _v (t) := f(t+v) - f(t), }\] \[{ G _w (t) := f(t+w) - f(t) . }\]Applying ${ C ^1 }$ mean value theorem twice,
\[{ \begin{align*} &\, G _v (x+w) - G _v (x) \\ = &\, \left( \int _0 ^1 DG _v (x+tw) \, dt \right) \cdot w \\ = &\, \int _0 ^1 [ Df (x + tw + v) - Df (x + tw ) ] \, dt \cdot w \\ = &\, \int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tw + sv) \cdot v \, ds \right] \, dt \cdot w \\ = &\, \int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tw + sv) \, ds \right] \, dt \, (v) (w) \end{align*} }\]and similarly
\[{ \begin{align*} &G _w (x + v) - G _w (x) \\ = &\, \int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tv + sw) \, ds \right] \, dt \, (w) (v). \end{align*} }\]But note the differences
\[{ G _v (x + w) - G _v (x) = f(x+w+v) - f(x+w) - (f(x+v) - f(x)) }\] \[{ G _w (x + v) - G _w (x ) = f(x + v + w) - f(x+v) - (f(x+w) - f(x)) }\]are equal, so
\[{ \begin{align*} &\int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tw + sv) \, ds \right] \, dt \, (v) (w) \\ = &\, \int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tv + sw) \, ds \right] \, dt \, (w) (v) \end{align*} }\]that is
\[{ \begin{align*} &\, D ^2 f (x) (v) (w) + \underbrace{\int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tw + sv) - D ^2 f (x) \, ds \right] \, dt \, (v) (w)} _{=: \, \Phi (v, w)} \\ = &\, D ^2 f (x) (w) (v) + \underbrace{\int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tv + sw) - D ^2 f (x) \, ds \right] \, dt \, (w) (v)} _{=: \, \Psi (v, w)} . \end{align*} }\]The residues ${ \Phi (v, w) , \Psi (v, w) }$ are bounded by
\[{ \begin{align*} &\lVert \Phi (v, w) \rVert \\ \leq &\, \left\lVert \int _0 ^1 \left[ \int _0 ^1 D ^2 f (x + tw + sv) - D ^2 f (x) \, ds \right] \, dt \right\rVert \lVert v \rVert \lVert w \rVert \\ \leq &\, \sup _{0 \leq t \leq 1} \left\lVert \int _0 ^1 D ^2 f (x + tw + sv) - D ^2 f (x) \, ds\right\rVert \lVert v \rVert \lVert w \rVert \\ \leq &\, \sup _{0 \leq t \leq 1} \left( \sup _{0 \leq s \leq 1} \left\lVert D ^2 f (x + tw + sv) - D ^2 f (x) \right\rVert \right) \lVert v \rVert \lVert w \rVert \\ \leq &\, \sup _{0 \leq t \leq 1} \left( \sup _{0 \leq s, t \leq 1} \left\lVert D ^2 f (x + tw + sv) - D ^2 f (x) \right\rVert \right) \lVert v \rVert \lVert w \rVert \\ = &\, \sup _{0 \leq s, t \leq 1} \left\lVert D ^2 f (x + tw + sv) - D ^2 f (x) \right\rVert \lVert v \rVert \lVert w \rVert \end{align*} }\]and
\[{ \lVert \Psi (v, w) \rVert \leq \sup _{0 \leq s, t \leq 1} \left\lVert D ^2 f (x + tv + sw) - D ^2 f (x) \right\rVert \lVert w \rVert \lVert v \rVert . }\]So
\[{ \begin{align*} &\lVert D ^2 f(x) (v)(w) - D ^2 f (x) (w) (v) \rVert \\ = &\, \lVert \Phi (v, w) - \Psi (v, w) \rVert \\ \leq &\, 2 \sup _{0 \leq s, t \leq 1} \left\lVert D ^2 f (x + tw + sv) - D ^2 f (x) \right\rVert \lVert v \rVert \lVert w \rVert . \end{align*} }\]Replacing ${ v, w }$ with ${ \tau v, \tau w }$ (where ${ \tau \in (0, 1) }$) in the above inequality and using continuity of ${ D ^2 f , }$
\[{ \begin{align*} &\lVert D ^2 f(x) (v)(w) - D ^2 f (x) (w) (v) \rVert \\ \leq &\, 2 \sup _{0 \leq s, t \leq 1} \left\lVert D ^2 f (x + \tau(tw + sv)) - D ^2 f (x) \right\rVert \lVert v \rVert \lVert w \rVert \\ &\, \to 0 \text{ as } \tau \to 0 . \end{align*} }\]Therefore
\[{ D ^2 f (x) (v) (w) = D ^2 f (x) (w) (v) }\]as needed. ${ \blacksquare }$
Back to top.
\[{ \underline{\textbf{Third derivative}} }\]Def [${ C ^3 }$ maps]: Let ${ E, F }$ be complete normed spaces, and ${ f : U (\subseteq E \text{ open}) \to F .}$
We say ${ f }$ is ${ C ^3 }$ if it is ${ C ^2 }$ and the second derivative is ${ C ^1 , }$ that is if the derivatives
exist and are continuous.
We saw ${ L(E, L(E, F)) \cong L ^2 (E; F) }$ as normed spaces. This can be generalised.
Obs [${ L (E, L ^k (E; F)) \cong L ^{k+1} (E; F) }$ as normed spaces]:
Let ${ E, F }$ be complete normed spaces. Then
is an isomorphism of normed spaces.
Here ${ \lambda }$ in ${ L(E, L ^k (E; F)) }$ takes inputs as ${ \lambda (\underline{\,}) (\underline{\,}, \ldots, \underline{\,}) , }$ and ${ \Phi (\lambda) }$ in ${ L ^{k+1} (E; F) }$ takes inputs as ${ \Phi (\lambda) (\underline{\,} , \underline{\,} , \ldots , \underline{\,} ) . }$
\[{ \begin{align*} &L (E, L ^k (E; F)) \overset{\cong}{\longleftrightarrow} L ^{k+1} (E; F) , \\ &\lambda (\underline{\,}) (\underline{\,}, \ldots, \underline{\,}) \longleftrightarrow \lambda (\underline{\,} , \underline{\,}, \ldots , \underline{\,}) . \end{align*} }\]
Writing (informally) ${ \Phi (\lambda) }$ as ${ \lambda (\underline{\,} , \underline{\,}, \ldots , \underline{\,}) , }$ the normed space isomorphism is
Pf: Similar to the previous proof/verification that ${ L (E, L(E, F)) \cong L ^2 (E; F) }$ as normed spaces.
Obs [${ L(E, \ldots, L(E, L(E, F)) \ldots ) \cong L ^k (E; F) }$ as normed spaces]:
Let ${ E, F }$ be complete normed spaces. Repeatedly using the above result gives
as normed spaces. The composed isomorphism is
\[{\Phi : \underbrace{L(E, \ldots, L(E, F) \ldots)} _{k \text{ many } Es} \longrightarrow L ^k (E; F) , }\] \[{ \lambda \mapsto ((x _1, \ldots, x _k) \mapsto \lambda (x _1) \ldots (x _k)) . }\]For example when ${ k = 3, }$
\[{ \begin{align*} &L(E, L(E, L(E, F))) \overset{\cong}{\longleftrightarrow} L(E, L ^2 (E; F)) \overset{\cong}{\longleftrightarrow} L ^3 (E; F), \\ &\, \quad \lambda (\_) (\_) (\_) \quad \longleftrightarrow \quad \lambda (\_) ( \_ \, , \, \_ ) \quad \longleftrightarrow \quad \lambda (\_ \, , \, \_ \, , \_ ) . \end{align*} }\]Obs: Any multilinear map
\[{ \lambda : \underbrace{\mathbb{R} ^n \times \ldots \times \mathbb{R} ^n} _{k \text{ many}} \to \mathbb{R} ^m, \quad \lambda = (\lambda _1, \ldots, \lambda _m) ^T }\]is continuous because each component
\[{ \lambda _i (x _1, \ldots, x _k ) = \sum _{j _1 , \ldots, j _k \in [n]} (x _1) _{j _1} \ldots (x _k) _{j _k} \lambda _i (e _{j _1}, \ldots, e _{j _k}) }\]is continuous.
Obs: Any ${ \lambda \in L ^k (\mathbb{R} ^n; \mathbb{R} ^m) }$ expands as above. Note
\[{ \Phi : L ^k (\mathbb{R} ^n ; \mathbb{R} ^m) \longrightarrow \lbrace \text{maps } [n] ^k \times [m] \to \mathbb{R} \rbrace, }\] \[{ \lambda \longmapsto \left( (j _1, \ldots, j _k; i) \mapsto \lambda _i (e _{j _1}, \ldots, e _{j _k}) \right) }\]is linear and bijective, making it an isomorphism of vector spaces.
Especially ${ L ^k (\mathbb{R} ^n ; \mathbb{R} ^m) }$ is ${ n ^k m }$ dimensional and all norms on it are equivalent.
The right hand side has the Frobenius norm
therefore
\[{ \lVert \lambda \rVert _{\text{Mult } F} := \lVert \Phi (\lambda) \rVert _F = \sqrt{\sum \lambda _i (e _{j _1}, \ldots, e _{j _k}) ^2 } }\]is a valid norm on ${ L ^k (\mathbb{R} ^n ; \mathbb{R} ^m ) }$ and makes ${ \Phi }$ an isomorphism of normed spaces.
Obs [${ C ^3 }$ maps ${ \mathbb{R} ^n \to \mathbb{R} ^m }$]:
Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m .}$ Say ${ f }$ is already ${ C ^2 , }$ that is all partials ${ D _{j _1 } f _i : U \to \mathbb{R} }$ and ${ D _{j _1} D _{j _2} f _i : U \to \mathbb{R} }$ exist and are continuous.
Now ${ f }$ is ${ C ^3 }$ if and only if ${ D ^2 f : U \to L ^2 (\mathbb{R} ^n ; \mathbb{R} ^m) }$ is ${ C ^1 , }$ if and only if ${ D ^2 f : U \to (L ^2 (\mathbb{R} ^n ; \mathbb{R} ^m) , \lVert \ldots \rVert _{\text{Mult } F} ) }$ is ${ C ^1 , }$ if and only if
is ${ C ^ 1 , }$ if and only if all partials ${ D _{j _1} D _{j _2} D _{j _3} f _i : U \to \mathbb{R} }$ exist and are continuous.
Thm [${ D ^3 f }$ for ${ C ^3 }$ maps ${ \mathbb{R} ^n \to \mathbb{R} ^m }$]:
Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m }$ be a ${ C ^3 }$ map and ${ x \in U . }$
Then the third derivative
is given by its action on basis vectors
\[{ D ^3 f : \mathbb{R} ^n \times \mathbb{R} ^n \times \mathbb{R} ^n \to \mathbb{R} ^m \quad \text{multilinear}, }\] \[{ \boxed{D ^3 f (x) (e _{j _1}) (e _{j _2}) (e _{j _3}) = \begin{pmatrix} D _{j _1} D _{j _2} D _{j _3} f _1 (x) \\ \vdots \\ D _{j _1} D _{j _2} D _{j _3} f _m (x) \end{pmatrix} } }\]that is
\[{ \begin{align*} &D ^3 f (x) (x _1) (x _2) (x _3) \\ = &\, \sum _{j _1, j _2, j _3 \in [n]} (x _1) _{j _1} (x _2) _{j _2} (x _3) _{j _3} \begin{pmatrix} D _{j _1} D _{j _2} D _{j _3} f _1 (x) \\ \vdots \\ D _{j _1} D _{j _2} D _{j _3} f _m (x) \end{pmatrix}. \end{align*} }\]Pf: Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m }$ be ${ C ^3 }$ and ${ x \in U . }$
By the definition of derivative, for ${ h }$ in a neighbourhood of ${ 0 , }$
So (for ${ t _1 }$ in a neighbourhood of ${ 0 }$)
\[{ \begin{align*} &D ^2 f(x + t _1 e _{j _1}) \, (e _{j _2}) (e _{j _3}) \\ = &\, D ^2 f (x) \, (e _{j _2}) (e _{j _3}) + D ^3 f (x) (t _1 e _{j _1}) \, (e _{j _2}) (e _{j _3}) \\ &+ \lVert t _1 e _{j _1} \rVert \varphi (t _1 e _{j _1}) \, (e _{j _2}) (e _{j _3}) \end{align*} }\]that is
\[{ \begin{align*} &\begin{pmatrix} D _{j _2} D _{j _3} f _1 (x + t _1 e _{j _1}) \\ \vdots \\ D _{j _2} D _{j _3} f _m (x + t _1 e _{j _1}) \end{pmatrix} \\ = &\, \begin{pmatrix} D _{j _2} D _{j _3} f _1 (x) \\ \vdots \\ D _{j _2} D _{j _3} f _m (x) \end{pmatrix} + t _1 D ^3 f (x) (e _{j _1}) (e _{j _2}) (e _{j _3}) \\ &\, + \vert t _1 \vert \varphi (t _1 e _{j _1}) \, (e _{j _2}) (e _{j _3}) . \end{align*} }\]Dividing by ${ t _1 \neq 0 }$ and letting ${ t _1 \to 0 }$ gives
\[{ D ^3 f (x) (e _{j _1}) (e _{j _2}) (e _{j _3}) = \begin{pmatrix} D _{j _1} D _{j _2} D _{j _3} f _1 (x) \\ \vdots \\ D _{j _1} D _{j _2} D _{j _3} f _m (x) \end{pmatrix} }\]as needed. ${ \blacksquare }$
Back to top.
\[{ \underline{\textbf{Higher derivatives}} }\]Def [${ C ^k }$ maps]: Let ${ E, F }$ be complete normed spaces, and ${ f : U (\subseteq E \text{ open}) \to F .}$
We say ${ f }$ is ${ C ^k }$ if the derivatives
exist and are continuous.
Obs [Composition of ${ C ^k }$ maps is ${ C ^k }$]:
Let ${ E, F, G }$ be complete normed spaces. If
are ${ C ^k }$ maps, then so is the composition ${ g \circ f . }$
Pf: Induction on ${ k . }$
For ${ k= 1 }$ case: Say ${ f, g }$ are ${ C ^1 . }$ By chain rule
We are to show the map ${ x \mapsto D(g \circ f) (x) }$ is continuous. The compositions
\[{ \underbrace{ x} _{\in \, U} \mapsto \underbrace{f(x)} _{\in \, V} \mapsto \underbrace{(Dg)(f(x))} _{\in \, L(F, G)} \mapsto \underbrace{((Dg)(f(x)), 0)} _{\in L(F, G) \times L(E, F) } }\]and
\[{ \underbrace{x} _{\in \, U} \mapsto \underbrace{(Df)(x)} _{\in \, L(E, F)} \mapsto \underbrace{(0, Df(x))} _{\in \, L(F, G) \times L(E, F)} }\]are continuous, hence their sum
\[{ \underbrace{x} _{\in \, U} \mapsto \underbrace{((Dg )(f(x)), Df(x))} _{\in \, L(F, G) \times L(E, F)} }\]is continuous. The map
\[{ L(F, G) \times L(E, F) \longrightarrow L(E, G) , \quad (\alpha, \beta) \mapsto \alpha \circ \beta }\]is continuous bilinear, hence the composition
\[{ \underbrace{x} _{\in U} \mapsto \underbrace{((Dg )(f(x)), Df(x))} _{\in \, L(F, G) \times L(E, F)} \mapsto \underbrace{(Dg) (f(x)) \circ (Df)(x)} _{\in \, L(E, G)} }\]is continuous, as needed.
For induction step: Say the theorem statement ${ P(k) }$ is true for some ${ k. }$ We are to show ${ P(k+1) }$ is true.
Let
be ${ C ^{k+1} }$ maps, with the induction hypothesis that composition of (any two compatible) ${ C ^k }$ maps is ${ C ^k . }$ We are to show ${ g \circ f }$ is ${ C ^{k+1}, }$ that is ${ x \mapsto D(g \circ f) (x) }$ is ${ C ^k . }$
The compositions
and
\[{ \underbrace{x} _{\in \, U} \mapsto \underbrace{(Df)(x)} _{\in \, L(E, F)} \mapsto \underbrace{(0, Df(x))} _{\in \, L(F, G) \times L(E, F)} }\]are ${ C ^k , }$ hence their sum
\[{ \underbrace{x} _{\in \, U} \mapsto \underbrace{((Dg )(f(x)), Df(x))} _{\in \, L(F, G) \times L(E, F)} }\]is ${ C ^k . }$ The map
\[{ L(F, G) \times L(E, F) \longrightarrow L(E, G) , \quad (\alpha, \beta) \mapsto \alpha \circ \beta }\]is continuous bilinear, hence the composition
\[{ \underbrace{x} _{\in U} \mapsto \underbrace{((Dg )(f(x)), Df(x))} _{\in \, L(F, G) \times L(E, F)} \mapsto \underbrace{(Dg) (f(x)) \circ (Df)(x)} _{\in \, L(E, G)} }\]is ${ C ^k , }$ as needed. ${ \blacksquare }$
Obs [${ C ^k }$ maps ${ \mathbb{R} ^n \to \mathbb{R} ^m }$]:
Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m .}$ Then ${ f }$ is a ${ C ^k }$ map if and only if all partials ${ D _{j _1} f _i : U \to \mathbb{R}, }$ ${ D _{j _1} D _{j _2} f _i : U \to \mathbb{R}, }$ ${ \ldots, }$ ${ D _{j _1} \ldots D _{j _k} f _i : U \to \mathbb{R} }$ exist and are continuous.
Pf: Proof by induction. Similar to the proof for ${ C ^2 }$ and ${ C ^3 }$ maps.
Obs [${ D ^k f }$ for ${ C ^k }$ maps ${ \mathbb{R} ^n \to \mathbb{R} ^m }$]:
Let ${ f : U (\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} ^m }$ be a ${ C ^k }$ map and ${ x \in U . }$
Then the ${ k ^{\text{th}} }$ derivative
is given by its action on basis vectors
\[{ D ^k f (x) : \underbrace{\mathbb{R} ^n \times \ldots \times \mathbb{R} ^n} _{k \text{ many}} \longrightarrow \mathbb{R} ^m \quad \text{ multilinear}, }\] \[{ \boxed{D ^k f(x) (e _{j _1}) \ldots (e _{j _k}) = \begin{pmatrix} D _{j _1} \ldots D _{j _k} f _1 (x) \\ \vdots \\ D _{j _1} \ldots D _{j _k} f _m (x) \end{pmatrix} } }\]that is
\[{ \begin{align*} D ^k f (x) (x _1) \ldots (x _k) = &\, \sum _{j _1, \ldots, j _k \in [n]} (x _1) _{j _1} \ldots (x _k) _{j _k} \begin{pmatrix} D _{j _1} \ldots D _{j _k} f _1 (x) \\ \vdots \\ D _{j _1} \ldots D _{j _k} f _m (x) \end{pmatrix} . \end{align*} }\]Pf: Similar to the proof for ${ C ^2 }$ and ${ C ^3 }$ maps.
Back to top.
\[{ \underline{\textbf{Taylor’s Theorem}} }\]Recall the proof of Taylor’s theorem for real functions (using integration by parts). It generalises as follows.
Thm [Taylor’s theorem]:
Consider complete normed spaces ${ E, F, }$ and a ${ C ^p }$ map ${ f : U (\subseteq E \text{ open}) \to F .}$ Fix an ${ x \in U . }$
Let ${ y \in U }$ be such that the segment ${ [[x, x + y]] = \lbrace x + ty : t \in [0, 1] \rbrace }$ is contained in ${ U . }$ Then, denoting by ${ y ^{(k)} }$ the ${ k -}$tuple ${ (y, \ldots, y), }$ we have
\[{ \begin{align*} &f(x+y) \\ = &\, f(x) + \frac{Df(x) y ^{(1)}}{1!} + \ldots + \frac{D ^p f(x) y ^{(p)} }{p!} + R _p (y) \end{align*} }\]where
\[{ R _p (y) = \int _0 ^1 \frac{(1-t) ^{p-1}}{(p-1)!} [ D ^p f(x + ty) - D ^p f (x)] y ^{(p)} \, dt . }\]Lem: If ${ F }$ is a complete normed space and ${ f : [a, b] \to F }$ is regulated, then ${ \lVert f \rVert }$ is regulated and ${ \lVert \int _a ^b f \rVert \leq \int _a ^b \lVert f \rVert .}$
Pf: There are step maps ${ s _n : [a, b] \to F }$ converging uniformly to ${ f .}$ Now ${ \sup _{t \in [a, b]} \vert \lVert s _n (t) \rVert - \lVert f (t) \rVert \vert \leq \lVert s _n - f \rVert \to 0 }$ so step maps ${ \lVert s _n \rVert }$ converge uniformly to ${ \lVert f \rVert . }$ Also each term ${ \lVert \int _a ^b s _n \rVert \leq \int _a ^b \lVert s _n \rVert, }$ so letting ${ n \to \infty }$ gives ${ \lVert \int _a ^b f \rVert \leq \int _a ^b \lVert f \rVert }$ as needed.Therefore the remainder ${ R _p (y) }$ can be bounded as
\[{ \begin{align*} \lVert R _p (y) \rVert \leq &\, \int _0 ^1 \frac{(1-t) ^{p-1} }{(p-1)!} \lVert [D ^p f(x + ty) - D ^p f(x)] y ^{(p)} \rVert \, dt \\ \leq &\, \sup _{0 \leq t \leq 1} \lVert D ^p f (x + ty) - D ^p f(x) \rVert \lVert y \rVert ^p \int _0 ^1 \frac{(1-t) ^{p-1}}{(p-1)!} \, dt \\ = &\, \sup _{0 \leq t \leq 1} \lVert D ^p f (x + ty) - D ^p f (x) \rVert \frac{\lVert y \rVert ^p}{p!} . \end{align*} }\]Especially,
\[{ \lim _{y \to 0 } \frac{\lVert R _p (y) \rVert}{\lVert y \rVert ^p} = 0 }\]that is ${ R _p (y) }$ is ${ o(\lVert y \rVert ^p) . }$
Pf: Consider the continuous bilinear map
\[{ \bullet : F \times \mathbb{R} \to F, \quad v \bullet a = a v . }\]Using this, we can integrate by parts any product ${ \varphi _1 (t) \bullet D\varphi _2 (t) }$ where ${ \varphi _1 : [0, 1] \to F }$ and ${ \varphi _2 : [0, 1] \to \mathbb{R} }$ are ${ C ^1 }$ maps. So, mimicking the proof for real functions,
\[{ \begin{align*} &f(x + y) \\ &\quad \\ = &\, f(x) + \int _0 ^1 \underbrace{Df(x + ty)y} _{\begin{aligned} &\varphi _1 (t) = (Df \circ \sigma )(t)y ; \\ &\sigma(t) = x + ty \end{aligned} } \bullet \underbrace{1} _{ \begin{aligned} &D\varphi _2 (t); \\ &\varphi _2 (t) = t-1 \end{aligned} } \, dt \\ &\quad \\ = &\, f(x) + Df(x + ty)y \bullet (t - 1) \Bigg\vert _0 ^1 \\ &- \int _0 ^1 D ^2 f (x + ty) y y \bullet (t - 1) \, dt \\ &\quad \\ = &\, f(x) + Df(x) y \\ &- \left( D ^2 f(x + ty) y ^{(2)} \bullet \frac{(t-1) ^2}{2} \Bigg\vert _0 ^1 - \int _0 ^1 D ^3 f(x + ty) y ^{(3)} \bullet \frac{(t-1) ^2}{2} \, dt \right) \\ &\quad \\ = &\, f(x) + Df(x) y + \frac{D ^2 f (x) y ^{(2)}}{2} \\ &+ \int _0 ^1 D ^3 f (x + ty) y ^{(3)} \bullet \frac{(t-1) ^2}{2} \, dt \\ &\quad \\ = &\, f(x) + Df(x) y + \frac{D ^2 f (x) y ^{(2)}}{2} \\ &+ \left( D ^3 f(x + ty) y ^{(3)} \bullet \frac{(t-1) ^3}{2 \cdot 3 } \Bigg\vert _0 ^1 - \int _0 ^1 D ^4 f (x + ty) y ^{(4)} \bullet \frac{(t-1) ^3}{2 \cdot 3 } \, dt \right) \\ &\quad \\ = &\, f(x) + Df(x)y + \frac{D ^2 f (x) y ^{(2)}}{2} + \frac{D ^3 f (x) y ^{(3)} }{2 \cdot 3} \\ &- \int _0 ^1 D ^4 f (x + ty) y ^{(4)} \bullet \frac{(t-1) ^3}{2 \cdot 3 } \, dt \\ &\quad \\ &\quad \vdots \\ &\quad \\ = &\, f(x) + Df(x) y + \frac{D ^2 f (x) y ^{(2)}}{2} + \ldots + \frac{D ^{p-1} f (x) y ^{(p-1)}}{(p-1)!} \\ &+ (-1) ^{p-1} \int _0 ^1 D ^p f(x + ty) y ^{(p)} \bullet \frac{(t-1) ^{p-1}}{(p-1)!} \, dt \\ &\quad \\ = &\, f(x) + Df(x) y + \frac{D ^2 f (x) y ^{(2)}}{2} + \ldots + \frac{D ^{p-1} f (x) y ^{(p-1)}}{(p-1)!} \\ &+ \int _0 ^1 \frac{(1-t) ^{p-1}}{(p-1)!} D ^p f(x + ty ) y ^{(p)} \, dt \\ &\quad \\ = &\, f(x) + Df(x) y + \ldots + \frac{D ^{p-1} f (x) y ^{(p-1)}}{(p-1)!} + \frac{D ^p f(x) y ^{(p)}}{p!} \\ &+ \int _0 ^1 \frac{(1-t) ^{p-1}}{(p-1)!} [ D ^p f(x + ty) - D ^p f(x)] y ^{(p)} \, dt \end{align*} }\]as needed. ${ \blacksquare }$
Def [Local extrema]:
Let ${ E }$ be a complete normed space, ${ f : U (\subseteq E \text{ open}) \to \mathbb{R} , }$ and ${ p \in U . }$
We say ${ f }$ has a local minimum at ${ p }$ if there is an ${ r > 0 }$ such that
We say ${ f }$ has a strict local minimum at ${ p }$ if there is an ${ r > 0 }$ such that
\[{ f(p) < f(x) \, \, \text{ for all } x \in B(p, r), x \neq p. }\]Local maximum and strict local maximum are defined similarly.
We see ${ f }$ has a local maximum at ${ p }$ if and only if ${ - f }$ has a local minimum at ${ p ,}$ and ${ f }$ has a strict local maximum at ${ p }$ if and only if ${ - f }$ has a strict local minimum at ${ p . }$
Obs [Necessary condition for local min]:
Let ${ E }$ be a complete normed space, ${ f : U (\subseteq E \text{ open}) \to \mathbb{R} , }$ and ${ p \in U .}$ Let ${ f }$ be differentiable at ${ p .}$ Now
Pf: Say ${ f }$ has a local minimum at ${ p . }$ Let ${ h \in E, h \neq 0 . }$ It suffices to show ${ Df(p) \, h = 0 . }$
For some ${ \varepsilon > 0 }$ we have
Note ${ g }$ is differentiable at ${ 0 }$ and has a local minimum at ${ 0 .}$ So ${ g ^{’} (0) = 0 }$ that is ${ Df(p) \, h = 0 , }$ as needed.
Thm [Necessary condition for local min]:
Let ${ f : U(\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} }$ be a ${ C ^2 }$ map, and ${ p \in U .}$ Now
Pf: By Taylor’s theorem, (for ${ h }$ in a neighbourhood of ${ 0 }$)
\[{ \begin{align*} &f(p+h) = f(p) + Df(p) \, h + \frac{1}{2} h ^T Hf(p) \, h + \lVert h \rVert ^2 \varphi (h) \\ &\text{with } \varphi(0) = 0 \text{ and } \varphi \text{ continuous at } 0 . \end{align*} }\]Say ${ f }$ has a local minimum at ${ p . }$ By previous observation, ${ Df(p) = 0 . }$ Let ${ e }$ be a unit vector in ${ \mathbb{R} ^n . }$ It suffices to show
\[{ \text{To show: } \quad e ^T Hf(p) \, e \geq 0 . }\]Putting ${ h = te }$ above and using local minimality, (for ${ t }$ in a neighbourhood of ${ 0 }$)
\[{ \begin{align*} f(p+te) - f(p) = &\, \frac{t ^2}{2} e ^T Hf(p) \, e + t ^2 \varphi(te) \geq 0 . \end{align*} }\]Let ${ \varepsilon > 0 . }$ By continuity of ${ \varphi }$ at ${ 0 , }$ we have for ${ t }$ in a neighbourhood of ${ 0 }$
\[{ \begin{align*} &\frac{t ^2}{2} e ^T Hf(p) \, e + t ^2 \varphi(te) \geq 0 \\ &\text{and } \vert \varphi (t e) \vert < \varepsilon. \end{align*} }\]For ${ t \neq 0 }$ in this neighbourhood
\[{ \begin{align*} e ^T Hf(p) \, e &\geq -2 \varphi(t e) \\ &> - 2 \varepsilon. \end{align*} }\]So ${ e ^T Hf(p) \, e > - 2 \varepsilon }$ for every ${ \varepsilon > 0 , }$ that is ${ e ^T Hf(p) \, e \geq 0 }$ as needed. ${ \blacksquare }$
Thm [Sufficient condition for local min]:
Let ${ f : U(\subseteq \mathbb{R} ^n \text{ open}) \to \mathbb{R} }$ be a ${ C ^2 }$ map, and ${ p \in U .}$ Now
Pf: Say ${ Df(p) = 0 }$ and ${ Hf(p) > 0 . }$ We are to show ${ f }$ has a strict local minimum at ${ p . }$
By Taylor’s theorem, (for ${ h }$ in a neighbourhood of ${ 0 }$)
The form ${ e \mapsto e ^T Hf(p) \, e > 0 }$ is continuous on the compact unit sphere. So there is a unit vector ${ e _0 }$ such that
\[{ \min _{\lVert e \rVert = 1} e ^T Hf(p) \, e = e _0 ^T Hf(p) \, e _0 > 0 . }\]Hence (for ${ h }$ in a neighbourhood of ${ 0 }$)
\[{ \begin{align*} &f(p+h) - f(p) \\ = &\, \frac{1}{2} h ^T Hf(p) \, h + \lVert h \rVert ^2 \varphi (h) \\ \geq &\, \frac{1}{2} \lVert h \rVert ^2 (\underbrace{e _0 ^T Hf(p) \, e _0} _{ = \, \alpha > 0 } ) + \lVert h \rVert ^2 \varphi (h ) \\ = &\, \lVert h \rVert ^2 \left( \frac{\alpha}{2} + \varphi(h) \right) . \end{align*} }\]By continuity of ${ \varphi }$ at ${ 0 ,}$ we have for ${ h }$ in a neighbourhood of ${ 0 }$
\[{ \begin{align*} &f(p+h) - f(p) \geq \lVert h \rVert ^2 \left( \frac{\alpha}{2} + \varphi(h) \right) \\ &\text{and } \vert \varphi (h) \vert < \frac{\alpha}{4}. \end{align*} }\]For ${ h \neq 0 }$ in this neighbourhood
\[{ \begin{align*} f(p+h) - f(p) \geq &\, \lVert h \rVert ^2 \left( \frac{\alpha}{2} + \varphi (h) \right) \\ > &\, \lVert h \rVert ^2 \left( \frac{\alpha}{2} - \frac{\alpha}{4} \right) \\ > &\, 0 \end{align*} }\]as needed. ${ \blacksquare }$
Back to top.
\[{ \underline{\textbf{Inverse function theorem}} }\]Thm [Contraction maps have a unique fixed point]:
Let ${ (X, d) }$ be a complete metric space and ${ T : X \to X .}$ Suppose there is a ${ K \in (0, 1) }$ such that
Then ${ T }$ has a unique fixed point (i.e. there is a unique ${ p \in X }$ such that ${ T(p) = p }$). Also, for any ${ x \in X }$ the sequence of iterates ${ (T ^n (x)) }$ converges to this fixed point.
Pf: Firstly, if at all a fixed point exists it is unique: Say ${ p _1, p _2 \in X }$ are two fixed points. Now
\[{ \underbrace{d(T(p _1), T(p _2))} _{= d(p _1, p _2)} \leq \underbrace{K} _{\text{in } (0, 1)} d(p _1, p _2) , }\]so ${ d(p _1, p _2) = 0 }$ that is ${ p _1 = p _2 . }$
Let ${ x \in X . }$ Note the sequence of iterates ${ (T ^n (x)) }$ is Cauchy: For all integers ${ m > n > 0 , }$
\[{ \begin{align*} &d(T ^{m} (x), T ^{n} (x)) \\ \leq &\, K ^n d(T ^{m-n} (x), x) \\ \leq &\, K ^n \left[ d(T ^{m-n} (x), T ^{m-n-1} (x)) + \ldots + d(T(x), x) \right] \\ \leq &\, K ^n (K ^{m-n-1} + \ldots + 1) \, d(T(x), x) \\ = &\, K ^n \frac{1-K ^{m-n} }{1-K} d(T(x), x) \\ = &\, \frac{K ^n - K ^m}{1-K} d(T(x), x) . \end{align*} }\]By completeness, there is a ${ p \in X }$ such that
\[{ p = \lim _{n \to \infty} T ^n (x) . }\]Since ${ T }$ is continuous,
\[{ \begin{align*} T(p) = &\, T\, \left( \lim _{n \to \infty} T ^n (x) \right) \\ = &\, \lim _{n \to \infty} T(T ^n (x)) \\ = &\, p \end{align*} }\]as needed. ${ \blacksquare }$
Basic recall: Invertibility/bijectivity of set maps.
Def [${ C ^p }$ isomorphisms]:
Consider complete normed spaces ${ E, F, }$ and a ${ C ^p }$ map ${ f : U (\subseteq E \text{ open}) \to F . }$
We say ${ f }$ is a ${ C ^p }$ isomorphism if ${ V := f(U) }$ is open and there is a ${ C ^p }$ map ${ g : V \to U }$ with ${ g \circ f = \text{id} _{U} }$ and ${ f \circ g = \text{id} _{V} . }$
(That is, if ${ f }$ maps ${ U }$ bijectively to an open set ${ f(U) , }$ and the inverse ${ f ^{-1} : f(U) \to U }$ is also ${ C ^p }$).
Let ${ x \in U . }$ We say ${ f }$ is a local ${ C ^p }$ isomorphism at ${ x }$ if there is an open neighbourhood ${ x \in U _1 }$ over which ${ f }$ is a ${ C ^p }$ isomorphism.
Obs [${ C ^p }$ isomorphisms]:
Consider complete normed spaces ${ E, F, }$ and a ${ C ^p }$ map
Further consider complete normed spaces ${ E _1, F _1, }$ and ${ C ^p }$ isomorphisms
\[{ \lambda : U _1 (\subseteq E _1 \text{ open}) \longrightarrow U (\subseteq E \text{ open}), }\] \[{ \mu : V _1 (\subseteq F _1 \text{ open}) \longrightarrow V (\subseteq F \text{ open}). }\]In this setup, since composition of ${ C ^p }$ isomorphisms is a ${ C ^p }$ isomorphism,
\[{ \begin{align*} &f : U \to V \text{ is a } C ^p \text{ isomorphism} \\ \iff &\, \mu ^{-1} \circ f \circ \lambda : U _1 \to V _1 \text{ is a } C ^p \text{ isomorphism}. \end{align*} }\]So when trying to show a map is a ${ C ^p }$ isomorphism, we can consider the problem upto composing by known ${ C ^p }$ isomorphisms.
Def [Toplinear isomorphisms]:
Let ${ E, F }$ be normed spaces, and ${ f : E \to F .}$
We say ${ f }$ is a toplinear isomorphism of it is an isomorphism of vector spaces and an isomorphism of topological spaces.
That is, ${ f }$ is a toplinear isomorphism if ${ f : E \to F }$ is continuous linear and there is a ${ g : F \to E }$ continuous linear with ${ g \circ f = \text{id} _E }$ and ${ f \circ g = \text{id} _F . }$
The set of toplinear isomorphisms ${ E \to F }$ is written
When ${ E, F }$ are complete, ${ L _{\text{is}} (E, F) }$ is an open subset of ${ L(E, F) . }$
Obs [${ \sum _{k=0} ^{\infty} A ^k }$ as a toplinear isomorphism]:
Let ${ E }$ be a complete normed space and ${ A \in L(E, E) . }$ Now
For ${ \lVert A \rVert < 1 }$ the inverse ${ (I-A) ^{-1} }$ ${ = \textstyle \sum _{k=0} ^{\infty} A ^k . }$
Especially ${ \lVert (I - A) ^{-1} \rVert = \lVert \sum _{k=0} ^{\infty} A ^k \rVert }$ ${ \leq \sum _{k = 0} ^{\infty} \lVert A \rVert ^k }$ ${ = (1 - \lVert A \rVert) ^{-1} . }$
Pf: Say ${ \lVert A \rVert < 1 . }$ The sequence of partial sums ${ \sum _{k=0} ^n A ^k }$ is Cauchy, because
\[{ \lVert \textstyle \sum _{k=n} ^{m} A ^k \rVert \leq \textstyle \sum _{k=n} ^{m} \lVert A \rVert ^k \quad (\text{for } m > n) }\]and the sequence ${ \sum _{k=0} ^{n} \lVert A \rVert ^k }$ is Cauchy.
By completeness, ${ \sum _{k=0} ^{\infty} A ^k }$ converges in ${ L(E, E) . }$
Finally ${ A (\sum _{k = 0} ^{\infty} A ^k) = (\sum _{k = 0} ^{\infty} A ^k) A = (\sum _{k = 0} ^{\infty} A ^k) - I }$ that is
\[{ (I-A) (\textstyle \sum _{k = 0} ^{\infty} A ^k) = (\sum _{k = 0} ^{\infty} A ^k) (I-A) = I , }\]as needed. ${ \blacksquare }$
Obs [${ L _ {\text{is}} (E, F) }$ is open in ${ L(E, F) }$]:
Let ${ E, F }$ be complete normed spaces. Now ${ L _{\text{is}} (E, F) }$ is an open subset of ${ L(E, F) . }$
For ${ E = F = \mathbb{R} ^n }$ it is clear: The set of toplinear isomorphisms is ${ L _{\text{is}} (\mathbb{R} ^n , \mathbb{R} ^n) = GL _n (\mathbb{R}), }$ which is preimage of ${ \mathbb{R} \setminus \lbrace 0 \rbrace }$ under the determinant map ${ \text{det} : M _n (\mathbb{R}) \to \mathbb{R} . }$
Pf: Let ${ A _0 \in L _{\text{is}} (E, F) . }$ We want a ${ \delta > 0 }$ such that
\[{ \text{Want: } \quad \begin{aligned} &\, A \in L(E, F), \, \lVert A - A _0 \rVert < \delta \\ \implies &\, A \in L _{\text{is}} (E, F) \end{aligned} }\]Note
\[{ \begin{align*} A = &\, A _0 + A - A _0 \\ = &\, A _0 [ \, \text{id} _{E} + A _0 ^{-1} (A - A _0) \, ] \\ \end{align*} }\]with ${ \text{id} _E + A _0 ^{-1} (A - A _0) }$ in ${ L (E, E) . }$ It suffices to find a ${ \delta > 0 }$ such that
\[{ \text{Want: } \quad \begin{aligned} &\, A \in L(E, F), \, \lVert A - A _0 \rVert < \delta \\ \implies &\, \text{id} _E + A _0 ^{-1} (A - A _0) \in L _{\text{is}} (E, E) \end{aligned} }\]Setting ${ \delta := \frac{1}{ 2 \lVert A _0 ^{-1} \rVert} }$ works, because if ${ \lVert A - A _0 \rVert < \delta }$
\[{ \lVert - A _0 ^{-1} (A - A _0) \rVert \leq \lVert A _0 ^{-1} \rVert \lVert A - A _0 \rVert < \frac{1}{2} }\]ensures ${ \text{id} _E - (- A _0 ^{-1} (A - A _0)) }$ is a toplinear isomorphism. ${ \blacksquare }$
Thm [Derivative of ${ \text{inv} }$]:
Let ${ E, F }$ be complete normed spaces. Then the map
is infinitely differentiable, with derivative
\[{ D \, \text{inv} : L _{\text{is}} (E, F) \longrightarrow L(L(E, F), L(F, E)) }\]given by
\[{ (D \, \text{inv})(A) \, H = - A ^{-1} H A ^{-1} }\]for ${ A \in L _{\text{is}} (E, F) }$ and ${ H \in L(E, F) . }$
For ${ E = F = \mathbb{R} }$ it is clear: ${ (D \, \text{inv}) (a) \, h = - a ^{-2} \, h . }$
Pf: We can first show ${ \text{inv} }$ is continuous.
- Obs-1: ${ \text{inv} }$ is continuous.
Let ${ A _0 \in L _{\text{is}} (E, F) , }$ and ${ A \in L(E, F) }$ such that ${ \lVert A - A _0 \rVert < \delta = \frac{1}{2\lVert A _0 ^{-1} \rVert} . }$
From previous observation,
Letting ${ \psi (A) := \text{id} _E + A _0 ^{-1} (A - A _0) , }$ we have
\[{ \begin{align*} &\, \text{inv}(A) - \text{inv}(A _0) \\ = &\, \psi(A) ^{-1} A _0 ^{-1} - A _0 ^{-1} \\ = &\, [\psi (A) ^{-1} - \text{id} _E \, ] A _0 ^{-1} \\ = &\, \psi (A) ^{-1} [\, \text{id} _E - \psi (A) ] A _0 ^{-1} \\ = &\, - \psi (A) ^{-1} A _0 ^{-1} (A - A _0) A _0 ^{-1} . \end{align*} }\]Note ${ \lVert \psi (A) ^{-1} \rVert }$ ${ \leq (1- \lVert - A _0 ^{-1} (A - A _0) \rVert ) ^{-1}. }$ Hence
\[{ \begin{align*} &\, \lVert \text{inv}(A) - \text{inv}(A _0) \rVert \\ \leq &\, \frac{\lVert A _0 ^{-1} \rVert ^2 \lVert A - A _0 \rVert }{1 - \lVert A _0 ^{-1} (A - A _0) \rVert } \\ \leq &\, \frac{\lVert A _0 ^{-1} \rVert ^2 \lVert A - A _0 \rVert}{1 - 1/2} , \end{align*} }\]giving continuity of ${ \text{inv} }$ at ${ A _0 . }$
- Obs-2: ${ \text{inv} }$ is differentiable.
Let ${ A _0 \in L _{\text{is}} (E, F) , }$ and ${ H \in L(E, F) }$ such that ${ \lVert H \rVert < \delta = \frac{1}{2\lVert A _0 ^{-1} \rVert} . }$
Rewriting previous observation,
and
\[{ \begin{align*} &\, \text{inv}(A _0 + H) - \text{inv}(A _0) \\ = &\, - \psi (A _0 + H) ^{-1} A _0 ^{-1} H A _0 ^{-1} . \end{align*} }\]Therefore
\[{ \begin{align*} &\, \text{inv}(A _0 + H) - \text{inv}(A _0) + A _0 ^{-1} H A _0 ^{-1} \\ = &\, [ \, \text{id} _E - \psi (A _0 + H) ^{-1} ] A _0 ^{-1} H A _0 ^{-1} \\ = &\, \underbrace{[\psi(A _0 + H) - \text{id} _E \, ]} _{= \, A _0 ^{-1} H } \, \underbrace{\psi (A _0 + H) ^{-1} A _0 ^{-1} H A _0 ^{-1}} _{= \, \text{inv}(A _0) - \text{inv}(A _0 + H) } . \end{align*} }\]Now
\[{ H \mapsto - A _0 ^{-1} H A _0 ^{-1} }\]gives a continuous linear map ${ L(E, F) \to L(F, E), }$ and the error
\[{ \varepsilon (H) := A _0 ^{-1} H \, [\text{inv}(A _0) - \text{inv}(A _0 + H) ] }\]satisfies ${ \frac{\varepsilon(H)}{\lVert H \rVert} \to 0 }$ as ${ H \to 0 . }$
Hence ${ \text{inv} }$ is differentiable at ${ A _0, }$ with
\[{ (D \, \text{inv}) (A _0) \, H = - A _0 ^{-1} H A _0 ^{-1} . }\]- Obs-3: ${ \text{inv} }$ is infinitely differentiable.
Consider the continuous bilinear maps
\[{ g : L(F, E) ^2 \longrightarrow L(L(E, F), L(F, E)) }\] \[{ g(T _1, T _2) \, S := - T _1 S T _2 }\]and
\[{ h : L(F, E) \longrightarrow L(F, E) ^2 }\] \[{ h(T) := (T, T). }\]From previous observation,
\[{ D \, \text{inv} = g \, \circ \, h \, \circ \, \text{inv} . }\]It is of the form
\[{ D \, \text{inv} = (\text{A } C ^{\infty} \text{ map}) \, \circ \, (\text{A } C ^{\infty} \text{ map}) \, \circ \, \text{inv} }\]where ${ C ^{\infty} }$ means infinite differentiability.
So for ${ k \geq 0 ,}$ we have
that is
\[{ \text{inv} \text{ is } C ^k \implies \text{inv} \text{ is } C ^{k+1} . }\]Since ${ \text{inv} }$ is ${ C ^0, }$ by above induction it is ${ C ^{\infty} , }$ as needed. ${ \blacksquare }$
With these observations, one can study local invertibility of functions.
Thm [Inverse function theorem]:
Consider complete normed spaces ${ E, F, }$ and a ${ C ^p }$ map ${ f : U (\subseteq E \text{ open}) \to F . }$ Let ${ x _0 \in U }$ be such that ${ Df(x _0) : E \to F }$ is a toplinear isomorphism.
Then ${ f }$ is a local ${ C ^p }$ isomorphism at ${ x _0 . }$
Pf: We see ${ Df(x _0) ^{-1} f : U \to E }$ is a ${ C ^p }$ map with derivative at ${ x _0 }$ being ${ \text{id} _E . }$ It suffices to show this map ${ f _1 := Df(x _0) ^{-1} f }$ is a local ${ C ^p }$ isomorphism at ${ x _0 . }$
Let ${ r > 0 }$ be such that ${ B(x _0, r) \subseteq U . }$ It suffices to show ${ f _2 : B(0, r) \to E, }$ ${ f _2 (x) := f _1 (x + x _0) }$ is a local ${ C ^p }$ isomorphism at ${ 0 . }$
It suffices to show the translation ${ \hat{f} : B(0, r) \to E, }$ ${ \hat{f}(x) := f _2 (x) - f _2 (0) }$ is a local ${ C ^p }$ isomorphism at ${ 0 . }$
Here
\[{ \hat{f} : B(0, r) \longrightarrow E, }\] \[{ \hat{f}(x) = Df(x _0) ^{-1} [ f(x + x _0) - f(x _0) ] . }\]It is a ${ C ^p }$ map, ${ D \hat{f}(0) = \text{id} _E }$ and ${ \hat{f}(0) = 0 . }$ We are to show
\[{ \text{To show: } \quad \hat{f} \text{ is a local } C ^p \text{ isomorphism at } 0 }\]that is
\[{ \text{To show: } \quad \begin{aligned} &\text{There are open neighbourhoods } U, V \\ &\, \text{of } 0 \text{ such that } \hat{f} \vert _U : U \to V \text{ is a bijection} \\ &\, \text{and } (\hat{f} \vert _U) ^{-1} \text{ is a } C ^p \text{ map.} \end{aligned} }\]We can proceed by first showing a local bijection.
- Obs-1: There exist ${ R _1, R _2 > 0 }$ (less than ${ r }$) such that ${ \hat{f} }$ gives a bijection ${ B[0, R _1] \to B[0, R _2] . }$
We want ${ R _1, R _2 > 0 }$ (less than ${ r }$) such that for every ${ y \in B[0, R _2 ] , }$ the map
\[{ g _y : B(0, r) \longrightarrow E, \quad g _y (x) := x - \hat{f}(x) + y }\]when viewed as a map ${ B[0, R _1] \to E }$ has a unique fixed point.
The maps ${ \hat{f}, g _y }$ (where ${ y \in E }$) are by default defined on ${ B(0, r) . }$
Note ${ g _y (x) = g _0 (x) + y . }$
As ${ Dg _0 (x) = I - D\hat{f}(x) }$ is continuous and ${ Dg _0 (0) = 0, }$ there is an ${ R > 0 }$ (less than ${ r }$) such that
Now for ${ \lVert x \rVert \leq R }$ the term ${ g _0 (x) }$ is bounded as
\[{ \require{cancel} \begin{align*} \lVert g _0 (x) \rVert = &\, \left\lVert \cancel{g _0 (0)} + \int _0 ^1 Dg _0 (tx) \, x \, dt \right\rVert \\ \leq &\, \frac{1}{2} \lVert x \rVert \end{align*} }\]that is
\[{ \lVert x \rVert \leq R \implies \lVert Dg _0 (x) \rVert \leq \frac{1}{2}, \, \lVert g _0 (x) \rVert \leq \frac{1}{2} \lVert x \rVert . }\]Hence for ${ \lVert x \rVert \leq R }$ and ${ \lVert y \rVert \leq R/2 , }$
\[{ \begin{align*} \lVert g _y (x) \rVert \leq &\, \lVert g _0 (x) \rVert + \lVert y \rVert \\ \leq &\, \frac{R}{2} + \frac{R}{2} = R . \end{align*} }\]For every ${ y \in B[0, R/2] , }$ ${ g _y }$ gives a map ${ B[0, R] \to B[0, R]. }$
In fact for every ${ y \in B[0, R/2] , }$ ${ g _y }$ gives a contraction map ${ B[0, R] \to B[0, R]. }$
Let ${ y \in B[0, R/2]. }$ For ${ x _1, x _2 \in B[0, R], }$
\[{ \begin{align*} &\, \lVert g _y (x _2) - g _y (x _1) \rVert \\ = &\, \lVert g _0 (x _2) - g _0 (x _1) \rVert \\ \leq &\, \sup _{x \in [[x _1, x _2]]} \lVert Dg _0 (x) \rVert \, \, \lVert x _1 - x _2 \rVert \\ \leq &\, \frac{1}{2} \lVert x _1 - x _2 \rVert . \end{align*} }\]
So for every ${ y \in B[0, R/2] , }$ there is a unique ${ x \in B[0, R] }$ with ${ x - \hat{f}(x) + y = x . }$
We observed
\[{ \text{Observed: } \quad \begin{aligned} &\, \text{There is an } R > 0 \text{ such that } \hat{f} \\ &\, \text{ gives a bijection } B[0, R] \to B[0, R/2] \end{aligned} }\]and Obs-1 is true.
Further, note for any ${ 0 < R ^{’} \leq R, }$ the argument which gave the bijection
\[{ B[0, R] \to B[0, R/2], \quad x \mapsto \hat{f}(x) }\]gives the bijection
\[{ B[0, R ^{’}] \to B[0, R ^{’} /2], \quad x \mapsto \hat{f}(x) . }\]For the main goal, we want open neighbourhoods ${ U, V }$ of ${ 0 }$ such that ${ \hat{f} }$ maps bijectively ${ U \to V . }$
Fix any ${ 0 < R ^{’} < R . }$
The above bijection ${ B[0, R ^{’}] \to B[0, R ^{’} / 2] }$ gives a bijection
\[{ \lbrace x \in B[0, R ^{’}] : \hat{f}(x) \in B(0, R ^{’} /2) \rbrace \to B(0, R ^{’} / 2) }\]that is
\[{ B[0, R ^{’}] \cap \hat{f} ^{-1} (B(0, R ^{’} /2)) \to B(0, R ^{’} /2) . }\]The left hand side is an open neighbourhood of ${ 0 }$:
- Obs-2: In the bijection ${ B[0, R ^{’}] \to B[0, R ^{’} /2], }$ we have ${ \lVert x \rVert = R ^{’} \implies \lVert \hat{f}(x) \rVert = R ^{’} /2 . }$
Especially
Say there is a point ${ x }$ with ${ \lVert x \rVert = R ^{’} }$ and ${ \lVert \hat{f}(x) \rVert < R ^{’} / 2 . }$
As ${ \hat{f} }$ is continuous, there is a ${ \delta > 0 }$ such that ${ B(x, \delta) \subseteq B(0, R) }$ and
Pick a point ${ x ^{\ast} \in B(x, \delta) }$ of norm ${ \lVert x ^{\ast} \rVert > \lVert x \rVert . }$ (For example, ${ x + \frac{x}{\lVert x \rVert} \frac{\delta}{2} .}$ ) Now
\[{ R ^{’} = \lVert x \rVert < \lVert x ^{\ast} \rVert < R \quad \text{ and } \quad \lVert \hat{f}(x ^{\ast}) \rVert, \lVert \hat{f}(x) \rVert < R ^{’} / 2 . }\]The ${ B[0, R] \to B[0, R/2] }$ preimage of ${ \hat{f}(x ^{\ast}) }$ is ${ x ^{\ast} . }$ But the bijection ${ B[0, R ^{’} ] \to B[0, R ^{’} / 2] }$ says this preimage must lie in ${ B[0, R ^{’}] , }$ a contradiction.
Hence Obs-2 is true.
Finally
\[{ U := B(0, R ^{’}) \cap \hat{f} ^{-1} (B(0, R ^{’} /2)), }\] \[{ V := B(0, R ^{’} / 2) }\]are open neighbourhoods of ${ 0 }$ such that
\[{ U \to V, \quad x \mapsto \hat{f}(x) }\]is a bijection.
- Obs-3: The map ${ V \to U, y \mapsto \hat{f} ^{-1} (y) }$ is continuous.
It suffices to show inverse of the bijection ${ B[0, R] \to B[0, R/2] }$ is continuous.
Any ${ x \in B[0, R] }$ can be written as
\[{ \begin{align*} x = &\, g _0 (x) + (x - g _0 (x)) \\ = &\, g _0 (x) + \hat{f}(x) . \end{align*} }\]So for all ${ x _1, x _2 \in B[0, R] , }$ we see
\[{ \begin{align*} &\, \lVert x _1 - x _2 \rVert \\ \leq &\, \lVert g _0 (x _1) - g _0 (x _2) \rVert + \lVert \hat{f}(x _1) - \hat{f}(x _2) \rVert \\ \leq &\, \frac{1}{2} \lVert x _1 - x _2 \rVert + \lVert \hat{f}(x _1) - \hat{f}(x _2) \rVert \end{align*} }\]that is
\[{ \frac{1}{2} \lVert x _1 - x _2 \rVert \leq \lVert \hat{f} (x _1) - \hat{f} (x _2) \rVert . }\]Rewriting this, for all ${ y _1, y _2 \in B[0, R/2], }$
\[{ \frac{1}{2} \lVert \hat{f} ^{-1} (y _1) - \hat{f} ^{-1} (y _2) \rVert \leq \lVert y _1 - y _2 \rVert . }\]Hence Obs-3 is true.
- Obs-4: The map ${ \mathfrak{F} : V \to U, y \mapsto \hat{f} ^{-1} (y) }$ is differentiable.
Firstly, for every ${ x \in U }$ the derivative ${ D \hat{f}(x) }$ is a toplinear isomorphism (${ \lVert Dg _0 (x) \rVert }$ ${ = \lVert I - D\hat{f}(x) \rVert }$ ${ \leq \frac{1}{2}}$ so ${ I - (I - D\hat{f}(x)) }$ is invertible).
Let ${ y _0 \in V }$ and ${ x _0 := \hat{f} ^{-1} (y _0) . }$ We will show
\[{ \text{Will show: } \quad (D\mathfrak{F})(y _0) = (D \hat{f} )(x _0) ^{-1} . }\]Let variable ${ y \in V }$ and ${ x := \hat{f} ^{-1} (y) . }$ We have (for ${ x }$ in a neighbourhood of ${ x _0 }$)
\[{ \begin{align*} &\, \hat{f}(x) = \hat{f}(x _0) + D\hat{f}(x _0) \, (x - x _0) + \lVert x - x _0 \rVert \varphi (x - x _0) \\ &\, \text{with } \varphi(0) = 0 \text{ and } \varphi \text{ continuous at } 0. \end{align*} }\]Rewriting this,
\[{ x = x _0 + D \hat{f} (x _0) ^{-1} \, [\hat{f}(x) - \hat{f} (x _0) - \lVert x - x _0 \rVert \varphi (x - x _0) ] }\]that is
\[{ \begin{align*} \hat{f} ^{-1} (y) = &\, \hat{f} ^{-1} (y _0) + D \hat{f} (x _0) ^{-1} \, (y - y _0) \\ &\, - \lVert x - x _0 \rVert D \hat{f}(x _0) ^{-1} \varphi (x - x _0) . \end{align*} }\]Note
\[{ \frac{\lVert x - x _0 \rVert D \hat{f}(x _0) ^{-1} \varphi (x - x _0)}{\lVert y - y _0 \rVert} \to 0 \text{ as } y \to y _0 }\]because ${ \lVert x - x _0 \rVert \leq 2 \lVert y - y _0 \rVert }$ from previous observation.
Finally
\[{ (D\mathfrak{F}) (y _0) = D\hat{f}(x _0) ^{-1} }\]and Obs-4 is true.
- Obs-5: The map ${ \mathfrak{F} : V \to U, y \mapsto \hat{f} ^{-1} (y) }$ is ${ C ^p . }$
From previous observation,
\[{ D \mathfrak{F} = \text{inv} \, \circ \, D \hat{f} \, \circ \, \mathfrak{F} . }\]It is of the form
\[{ D \mathfrak{F} = (\text{A } C ^{\infty} \text{ map}) \, \circ \, (\text{A } C ^{p-1} \text{ map}) \, \circ \, \mathfrak{F} . }\]So for ${ 0 \leq k \leq p-1 , }$ we have
\[{ \mathfrak{F} \text{ is } C ^k \implies D \mathfrak{F} \text{ is } C ^k }\]that is
\[{ \mathfrak{F} \text{ is } C ^k \implies \mathfrak{F} \text{ is } C ^{k+1} . }\]Since ${ \mathfrak{F} }$ is ${ C ^0 , }$ by above induction it is ${ C ^p , }$ as needed. ${ \blacksquare }$
Back to top.
\[{ \underline{\textbf{Implicit function theorem}} }\]Consider complete normed spaces ${ E, F, G , }$ and a ${ C ^p }$ map
\[{ f : U (\subseteq E \times F \text{ open}) \longrightarrow G . }\]It has a zero set
\[{ f ^{-1} (0) = Z _f = \lbrace (x, y) : (x, y) \in U, f(x, y) = 0 \rbrace . }\]Let ${ (a, b) \in Z _f . }$ The goal is to study the structure of ${ Z _f }$ near ${ (a, b) .}$
We can ask ourselves: When does ${ Z _f }$ look like the graph of a function near ${ (a, b) }$?
Obs [Open subsets of ${ E \times F }$]:
Let ${ E, F }$ be complete normed spaces. Unless otherwise mentioned, ${ E \times F }$ is equipped with the sup norm. Now for ${ (a, b) \in E \times F }$ and ${ \delta > 0 , }$
Unions of such open balls make up the open subsets of ${ E \times F . }$
Obs: Let ${ E, F }$ be complete normed spaces, and ${ U, V }$ be open subsets of ${ E, F }$ respectively. Then ${ U \times V }$ is an open subset of ${ E \times F . }$
Pf: Let ${ (a, b) \in U \times V . }$ There are ${ \delta _1, \delta _2 > 0 }$ such that ${ B(a, \delta _1) \subseteq U }$ and ${ B(b, \delta _2) \subseteq V . }$ Considering ${ \delta = \min \lbrace \delta _1, \delta _2 \rbrace > 0 , }$ the ball
as needed. ${ \blacksquare }$
Def [Partial derivatives]:
Consider complete normed spaces ${ E, F, G, }$ open subsets ${ U \subseteq E, }$ ${ V \subseteq F , }$ and a map
Let ${ (a, b) \in U \times V . }$ If the section
\[{ f ^{b} = f(\cdot, b) : U \longrightarrow G, \quad x \mapsto f(x, b) }\]is differentiable at ${ a , }$ the derivative ${ (Df ^{b} )(a) \in L(E, G) }$ is written as ${ \partial _1 f(a, b) . }$
Similarly if the section
is differentiable at ${ b, }$ the derivative ${ (D f ^{a}) (b) \in L(F, G) }$ is written as ${ \partial _2 f (a, b) . }$
Thm [${ C ^p }$ maps ${ E \times F \to G }$]:
Consider complete normed spaces ${ E, F, G, }$ open subsets ${ U \subseteq E, }$ ${ V \subseteq F , }$ and a map
Now ${ f }$ is a ${ C ^p }$ map if and only if the partials
\[{ \partial _1 f : U \times V \longrightarrow L(E, G), }\] \[{ \partial _2 f : U \times V \longrightarrow L(F, G) }\]exist and are ${ C ^{p-1} . }$
In this case, for ${ (a, b) \in U \times V }$ the derivative ${ (Df)(a, b) \in L(E \times F, G) }$ is given by
Pf: ${ \underline{\Rightarrow} }$ Say ${ f }$ is ${ C ^p . }$ We are to show the partials ${ \partial _1 f, }$ ${ \partial _2 f }$ exist and are ${ C ^{p-1} . }$
Let ${ (a, b) \in U \times V . }$ By definition of derivative, for ${ (h, k) }$ in a neighbourhood of ${ (0,0), }$
\[{ \begin{aligned} &\, f(a+h, b+k) = f(a, b) + Df(a, b) \, (h, k) + \lVert (h, k) \rVert \varphi(h, k) \\ &\, \text{with } \varphi(0, 0) = 0 \text{ and } \varphi \text{ continuous at } (0, 0) . \end{aligned} }\]Especially for ${ h }$ in a neighbourhood of ${ 0 , }$
\[{ \begin{aligned} &\, f(a+h, b) = f(a, b) + Df(a, b) \, (h, 0) + \lVert h \rVert \varphi (h, 0) \end{aligned} }\]so the partial derivative ${ \partial _1 f(a, b) \in L(E, G) }$ is given by ${ h \mapsto Df(a, b) \, (h, 0). }$
Similarly the partial derivative ${ \partial _2 f(a, b) \in L(F, G) }$ is given by ${ k \mapsto Df (a, b) \, (0, k) . }$
With this, ${ \partial _1 f(a, b) }$ and ${ \partial _2 f(a, b) }$ exist, and
\[{ Df (a, b) \, (h, k) = \partial _1 f (a, b) \, h + \partial _2 f (a, b) \, k . }\]It is left to show that
\[{ \partial _1 f : \underbrace{(x, y)} _{\in \, U \times V} \mapsto \underbrace{Df(x, y) \, (\cdot \, , \, 0)} _{\in \, L(E, G)}, }\] \[{ \partial _2 f : \underbrace{(x, y)} _{\in \, U \times V} \mapsto \underbrace{Df(x, y) \, (0 \, , \, \cdot)} _{\in \, L(F, G)} }\]are ${ C ^{p-1} }$ maps. Since
\[{ \alpha _1 : L(E \times F, G) \longrightarrow L(E, G), \quad \alpha _1 (\varphi) = \varphi(\cdot \, , \, 0) }\] \[{ \alpha _2 : L(E \times F, G) \longrightarrow L(F, G), \quad \alpha _2 (\varphi) = \varphi(0 , \, \cdot) }\]are continuous linear, the compositions
\[{ \partial _1 f = \alpha _1 \, \circ \, Df , }\] \[{ \partial _2 f = \alpha _2 \, \circ Df }\]are ${ C ^{p-1} }$ maps, as needed.
${ \underline{\Leftarrow} }$ Say the partials
\[{ \partial _1 f : U \times V \longrightarrow L(E, G), }\] \[{ \partial _2 f : U \times V \longrightarrow L(F, G) }\]exist and are ${ C ^{p-1} . }$ We are to show ${ f }$ is a ${ C ^p }$ map.
Let ${ (a, b) \in U \times V . }$ For ${ (h, k) }$ in a neighbourhood of ${ 0 , }$
\[{ \begin{aligned} &\, f(a+h, b+k) - f(a, b) - \partial _1 f(a, b) \, h - \partial _2 f(a, b) \, k \\ = &\, f(a+h, b+k) - f(a, b+k) \\ &\, + f(a, b+k) - f(a, b) \\ &\, - \partial _1 f(a, b) \, h - \partial _2 f(a, b) \, k \\ = &\, \int _0 ^1 \partial _1 f(a+sh, b+k) \, h \, ds \\ &\, + \int _0 ^1 \partial _2 f(a, b + tk) \, k \, dt \\ &\, - \partial _1 f(a, b) \, h - \partial _2 f(a, b) \, k \\ = &\, \int _0 ^1 [\partial _1 f(a + sh, b+k) - \partial _1 f(a, b)] \, h \, ds \\ &\, + \int _0 ^1 [\partial _2 f(a, b+tk) - \partial _2 f(a, b)] \, k \, dt . \end{aligned} }\]Calling this error ${ \varepsilon (h, k) , }$ we have
\[{ \begin{aligned} &\, \lVert \varepsilon(h, k) \rVert \\ \leq &\, \max _{0 \leq s \leq 1} \lVert \partial _1 f (a + sh, b + k) - \partial _1 f(a, b) \rVert \lVert h \rVert \\ &\, + \max _{0 \leq t \leq 1} \lVert \partial _2 f (a, b+tk) - \partial _2 f (a, b) \rVert \lVert k \rVert \\ \leq &\, \left( {\begin{aligned} &\, \max _{0 \leq s \leq 1} \lVert \partial _1 f (a + sh, b + k) - \partial _1 f(a, b) \rVert \\ &\, + \max _{0 \leq t \leq 1} \lVert \partial _2 f (a, b+tk) - \partial _2 f (a, b) \rVert \end{aligned}} \right) \max \lbrace \lVert h \rVert , \lVert k \rVert \rbrace . \end{aligned} }\]Now
\[{ (h, k) \mapsto \partial _1 f(a, b) \, h + \partial _2 f (a, b) \, k }\]gives a continuous linear map ${ E \times F \to G , }$ and the error ${ \varepsilon(h, k) }$ satisfies ${ \varepsilon(0, 0) = 0 }$ and
\[{ \frac{\lVert \varepsilon(h, k) \rVert}{\max \lbrace \lVert h \rVert, \lVert k \rVert \rbrace} \to 0 \quad \text{ as } \, \, \max \lbrace \lVert h \rVert, \lVert k \rVert \rbrace \to 0 . }\]Hence the ${ f }$ is differentiable at ${ (a, b) , }$ with
\[{ Df(a, b) \, (h, k) = \partial _1 f(a, b) \, h + \partial _2 f(a, b) \, k . }\]It is left to show that ${ f }$ is a ${ C ^p }$ map, that is ${ (x, y) \mapsto Df(x, y) }$ is ${ C ^{p-1} . }$ The compositions
\[{ \underbrace{(x, y)} _{\in \, U \times V} \mapsto \underbrace{\partial _1 f(x, y)} _{\in \, L(E, G)} \mapsto \underbrace{(\partial _1 f (x, y), 0) } _{\in L(E, G) \times L(F, G)} }\] \[{ \underbrace{(x, y)} _{\in \, U \times V} \mapsto \underbrace{\partial _2 f(x, y)} _{\in \, L(F, G)} \mapsto \underbrace{(0, \partial _2 f (x, y)) } _{\in L(E, G) \times L(F, G)} }\]are ${ C ^{p-1}, }$ hence their sum
\[{ \underbrace{(x, y)} _{\in \, U \times V} \mapsto \underbrace{(\partial _1 f (x, y), \partial _2 f (x, y))} _{\in \, L(E, G) \times L(F, G)} }\]is ${ C ^{p-1} . }$ The map
\[{ \alpha : L(E, G) \times L(F, G) \longrightarrow L(E \times F, G), }\] \[{ \alpha (\varphi _1, \varphi _2) \, (h, k) = \varphi _1 (h) + \varphi _2 (k) }\]is continuous linear, hence the composition
\[{ \underbrace{(x, y)} _{\in \, U \times V} \mapsto \underbrace{(\partial _1 f (x, y), \partial _2 f (x , y))} _{\in \, L(E, G) \times L(F, G) } \mapsto \underbrace{\alpha(\partial _1 f(x, y), \partial _2 f(x, y))} _{\in \, L(E \times F, G)} }\]is ${ C ^{p-1}, }$ as needed. ${ \blacksquare }$
Consider complete normed spaces ${ E, F, G , }$ and a ${ C ^p }$ map
\[{ f : U (\subseteq E \times F \text{ open}) \longrightarrow G . }\]It has a zero set
\[{ Z _f = \lbrace (x, y) : (x, y) \in U, f(x, y) = 0 \rbrace . }\]Let ${ (a, b) \in Z _f . }$ The goal is to study the structure of ${ Z _f }$ near ${ (a, b) .}$
Informally, for small ${ \delta > 0 , }$ we have ${ B((a, b), \delta) \subseteq U }$ and
\[{ \require{cancel} \begin{aligned} &\, Z _f \cap B((a, b), \delta) \\ = &\, \lbrace (a+h, b+k) : \lVert h \rVert < \delta, \lVert k \rVert < \delta, f(a+h, b+k) = 0 \rbrace \\ \approx &\, \lbrace (a+h, b+k) : \lVert h \rVert < \delta, \lVert k \rVert < \delta, \cancel{f(a, b)} + \partial _1 f (a, b) \, h + \partial _2 f (a, b) \, k = 0 \rbrace \\ = &\, \lbrace (x, y) : \lVert x - a \rVert < \delta, \lVert y - b \rVert < \delta, \, \partial _1 f (a, b) \, (x - a) + \partial _2 f (a, b) \, (y-b) = 0 \rbrace . \end{aligned} }\]Now if ${ \partial _2 f(a, b) }$ is invertible, the approximating set looks like the graph of
\[{ E \longrightarrow F , }\] \[{ x \mapsto y = b - [\partial _2 f(a, b) ] ^{-1} \partial _1 f (a, b) (x - a) }\]within ${ B(a, \delta) \times B(b, \delta) . }$ This suggests the following.
Thm [Implicit function theorem]:
Consider complete normed spaces ${ E, F, G , }$ and a ${ C ^p }$ map
It has a zero set
\[{ Z _f = \lbrace (x, y) : (x, y) \in U, f(x, y) = 0 \rbrace . }\]Let ${ (a, b) \in Z _f }$ be such that ${ \partial _2 f(a, b) : F \to G }$ is a toplinear isomorphism.
Then there exist open neighbourhoods
and a ${ C ^p }$ map
\[{ g : \mathscr{V} \longrightarrow F }\]such that
\[{ Z _f \cap \mathscr{U} = (\text{graph of } g) }\]that is
\[{ \lbrace (x, y) \in \mathscr{U} : f(x, y) = 0 \rbrace = \lbrace (x, g(x)) : x \in \mathscr{V} \rbrace . }\]Further,
\[{ Dg(x) = - [\partial _2 f \, (x, g(x))] ^{-1} \partial _1 f \, (x, g(x)) }\]over a neighbourhood of ${ a . }$
\[{ \boxed{\begin{aligned} &\, \textbf{Heuristic:} \text{ If } f : U (\subseteq E \times F) \to G \text{ is a } C ^p \text{ map, } \\ &\, \text{near any point of } Z _f \text{ where } \partial _2 f \text{ is nonsingular, } Z _f \\ &\, \text{looks like the graph of a } C ^p \text{ map } V (\subseteq E) \to F . \end{aligned} } }\]Pf: Since ${ \partial _2 f (a, b) : F \to G }$ is a toplinear isomorphism, consider the map
\[{ \hat{f} = [\partial _2 f (a, b) ] ^{-1} f : U (\subseteq E \times F ) \longrightarrow F . }\]It is a ${ C ^p }$ map with ${ Z _{\hat{f}} = Z _f }$ and ${ \partial _2 \hat{f} (a, b) = \text{id} _F . }$
Rewriting the goal, we want open neighbourhoods
\[{ (a, b) \in \mathscr{U} \subseteq U, \quad a \in \mathscr{V} \subseteq E }\]and a ${ C ^p }$ map
\[{ g : \mathscr{V} \longrightarrow F }\]such that
\[{ Z _{\hat{f}} \cap \mathscr{U} = (\text{graph of } g). }\] \[{ }\]In an attempt to “complete” the map
\[{ \hat{f} : U (\subseteq E \times F) \longrightarrow F }\]to a map
\[{ U (\subseteq E \times F) \longrightarrow E \times F }\]which is locally invertible at ${ (a, b) }$ i.e. has nonsingular derivative at ${ (a, b), }$ we can consider
\[{ \varphi : U (\subseteq E \times F \text{ open}) \longrightarrow E \times F, }\] \[{ \varphi (x, y) = (x, \hat{f}(x, y)) . }\]By a usual composition argument, it is a ${ C ^p }$ map. The derivative
\[{ (D \varphi) (a, b) \in L(E \times F, E \times F) }\]is the derivative of the sum of ${ (x, y) \mapsto (x, 0) }$ and ${ (x, y) \mapsto (0, \hat{f}(x, y)) }$ at the point ${ (a, b), }$ and so is given by
\[{ \begin{aligned} &\, (D \varphi) (a, b) \, (h, k) \\ = &\, (h, D \hat{f} (a, b) \, (h, k) ) \\ = &\, (h, \partial _1 \hat{f} (a, b) \, h + \partial _2 \hat{f} (a, b) \, k ) \\ = &\, (h, \partial _1 \hat{f}(a, b) \, h + k ) \end{aligned} }\]that is
\[{ (D\varphi) (a, b) \, \begin{pmatrix} h \\ k \end{pmatrix} = \begin{pmatrix} \text{id} _E &0 \\ \partial _1 \hat{f}(a, b) &\text{id} _F \end{pmatrix} \begin{pmatrix} h \\ k \end{pmatrix} . }\]The continuous linear map
\[{ (D\varphi) (a, b) = \begin{pmatrix} \text{id} _E &0 \\ \partial _1 \hat{f}(a, b) &\text{id} _F \end{pmatrix} \in L(E \times F, E \times F) }\]has a continuous linear inverse
\[{ \begin{pmatrix} \text{id} _E &0 \\ -\partial _1 \hat{f} (a, b) &\text{id} _F \end{pmatrix} \in L(E \times F, E \times F) , }\]hence ${ D\varphi (a, b) }$ is nonsingular as needed.
By inverse function theorem, ${ \varphi : U (\subseteq E \times F) \longrightarrow E \times F }$ is a local ${ C ^p }$ isomorphism at ${ (a, b) . }$ There exist open neighbourhoods
\[{ (a, b) \in W \subseteq U, \quad (a, 0) \in W ^{’} \subseteq E \times F }\]such that ${ \varphi \big\vert _{W } : W \longrightarrow W ^{’} }$ is a ${ C ^p }$ isomorphism.
Let ${ \psi = \left(\varphi \big\vert _{W } \right) ^{-1} : W ^{’} \longrightarrow W . }$ It is of the form
\[{ \psi (\mathbf{x}, \mathbf{y} ) = (\psi _1 (\mathbf{x}, \mathbf{y}), \psi _2 (\mathbf{x} , \mathbf{y})) \quad \text{ for } (\mathbf{x}, \mathbf{y}) \in W ^{’} }\]where ${ \psi _1 : W ^{’} \to E , }$ ${ \psi _2 : W ^{’} \to F }$ are ${ C ^p }$ maps. It has a more specific structure. We have
\[{ \begin{aligned} &\, (\mathbf{x}, \mathbf{y}) \\ = &\, (\varphi \circ \psi)(\mathbf{x}, \mathbf{y}) \\ = &\, (\psi _1 (\mathbf{x}, \mathbf{y}), \, \, \hat{f}(\psi _1 (\mathbf{x}, \mathbf{y}), \psi _2 (\mathbf{x}, \mathbf{y}))) \end{aligned} }\]that is
\[{ \mathbf{x} = \psi _1 (\mathbf{x}, \mathbf{y}), \quad \mathbf{y} = \hat{f}(\mathbf{x}, \psi _2 (\mathbf{x}, \mathbf{y})) }\]for all ${ (\mathbf{x}, \mathbf{y}) \in W ^{’} . }$
Especially
\[{ (\mathbf{x}, 0) \in W ^{’} \implies 0 = \hat{f}(\underbrace{\mathbf{x}, \psi _2 (\mathbf{x}, 0)} _{\psi(\mathbf{x}, 0) \in W } ), }\]that is points ${ (\mathbf{x}, 0) \in W ^{’} }$ give points ${ \psi (\mathbf{x}, 0) = (\mathbf{x}, \psi _2 (\mathbf{x}, 0)) \in Z _{\hat{f}} \cap W . }$
Since
\[{ \lbrace x \in E : (x, 0) \in W ^{’} \rbrace }\]is an open neighbourhood of ${ a, }$ this suggests that setting
\[{ \mathscr{U} := W, \quad \mathscr{V} := \lbrace x \in E : (x, 0) \in W ^{’} \rbrace }\]and
\[{ g : \mathscr{V} \longrightarrow F, \quad g(x) = \psi _2 (x, 0) }\]works.
Indeed, the graph of ${ g }$ looks like
\[{ \begin{aligned} &\, (\text{graph of } g) \\ = &\, \lbrace (x, \psi _2 (x, 0)) : (x, 0) \in W ^{’} \rbrace \\ \subseteq &\, Z _{\hat{f}} \, \cap W , \end{aligned} }\]and it is left to show that every point of ${ Z _{\hat{f}} \cap W }$ lies in the graph of ${ g . }$
We have
\[{ \begin{aligned} &\, (x, y) \\ = &\, (\psi \circ \varphi)(x, y) \\ = &\, (x, \, \psi _2 (x, \hat{f}(x, y))) \end{aligned} }\]for all ${ (x, y) \in W , }$ that is
\[{ y = \psi _2 (x, \hat{f}(x, y)) \quad \text{ for all } (x, y) \in W . }\]Especially
\[{ y = \psi _2 (x, 0) \quad \text{ for all } (x, y) \in Z _{\hat{f}} \cap W . }\]Finally
\[{ \begin{aligned} &\, (x, y) \in Z _{\hat{f}} \cap W \\ \implies &\, y = \psi _2 (x, 0), \, \, \underbrace{\varphi(x, y)} _{ (x, 0)} \in W ^{’} \end{aligned} }\]that is
\[{ Z _{\hat{f}} \cap W \subseteq (\text{graph of } g) , }\]as needed.
The second part, on derivative of the implicit function ${ g , }$ follows from chain rule. Let ${ A(x) = \hat{f} (x, g(x)) }$ be the composition
\[{ \underbrace{x} _{\in \, \mathscr{V}} \overset{\alpha}{\longmapsto} \underbrace{(x, g(x))} _{\in \, U } \overset{\hat{f}}{\longmapsto} \underbrace{\hat{f}(x, g(x))} _{\in \, F} . }\]It is identically zero. Hence
\[{ \begin{aligned} &\, DA(x) \, h \\ = &\, (D\hat{f})(\alpha(x)) \circ (D \alpha)(x) \, h \\ = &\, (D\hat{f}) (\alpha(x)) \, (h, Dg(x) \, h) \\ = &\, \partial _1 \hat{f} (\alpha(x)) \, h + \partial _2 \hat{f} (\alpha(x)) \, Dg(x) \, h \\ = &\, \partial _1 \hat{f} (x, g(x)) \, h + \partial _2 \hat{f} (x, g(x)) \, Dg(x) \, h \\ = &\, 0 , \end{aligned} }\]that is
\[{ \begin{aligned} &\, \partial _1 \hat{f} (x, g(x)) \, \text{id} _E + \partial _2 \hat{f} (x, g(x)) \, Dg(x) = 0 \\ &\, \text{for all } x \in \mathscr{V} . \end{aligned} }\]The ${ C ^{p-1} }$ map
\[{ \mathscr{V} \longrightarrow L(F, F) , }\] \[{ x \longmapsto \partial _2 \hat{f} (x, g(x)) }\]sends ${ a }$ to ${ \text{id} _F, }$ an element of the open set ${ L _{\text{is}} (F, F) . }$ So there is a neighbourhood of ${ a }$ (contained in ${ \mathscr{V} }$) over which ${ \partial _2 \hat{f} (x, g(x)) \in L _{\text{is}} (F, F) . }$
For ${ x }$ in this neighbourhood,
\[{ \begin{aligned} Dg(x) = &\, - [\partial _2\hat{f} (x, g(x))] ^{-1} \partial _1 \hat{f} (x, g(x)) \\ = &\, - [\partial _2 f \, (x, g(x))] ^{-1} \partial _1 f \, (x, g(x)) \end{aligned} }\]as needed. ${ \blacksquare }$