\documentclass{book}
\usepackage{amsthm,amssymb,amsmath,graphicx,geometry,mathtools,comment,hyperref}
\usepackage[margin=2cm,font=scriptsize]{caption}
\usepackage{geometry}
\geometry{
a4paper,
total={170mm,257mm},
left=20mm,
top=20mm,
}
\title{Multivariable Analysis}
\author{Roland van der Veen}
\date{2019}
\newcommand{\A}{\mathcal{A}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\HH}{\mathbb{H}}
\newcommand{\D}{\mathbb{D}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\T}{\mathbb{T}}
\newcommand{\I}{\mathcal{I}}
\newcommand{\p}{\partial}
\newcommand{\im}{\mathrm{im}}
%\newcommand{\ker}{\mathrm{ker}}
\newcommand{\ddiv}{\mathrm{div}}
\newcommand{\grad}{\mathrm{grad}}
\newcommand{\curl}{\mathrm{curl}}
\newcommand{\la}{\langle}
\newcommand{\ra}{\rangle}
\newtheorem{definition}{Definition}
\numberwithin{definition}{subsection}
\newtheorem{theorem}{Theorem}
\numberwithin{theorem}{subsection}
\newtheorem{corollary}{Corollary}
\numberwithin{corollary}{subsection}
\newtheorem{lemma}{Lemma}
\numberwithin{lemma}{subsection}
\begin{document}
\begin{center}
{\footnotesize - }\\
\vspace{3cm}
{\Huge Multivariable analysis}\\
\vspace{2cm}
\includegraphics[width=8cm]{Pictures/Tennis.png}\\
\vspace{2cm}
{\Large Roland van der Veen}\\
\vspace{2cm}
{\large Groningen, 16-12-2019}\\
\vspace{4cm}
\end{center}
\tableofcontents
\chapter{Introduction}
The goal of these notes is to explore the notions of differentiation and integration in arbitrarily many variables.
The material is focused on answering two basic questions:
\begin{enumerate}
\item How to solve an equation? How many solutions can one expect?
\item Is there a higher dimensional analogue the fundamental theorem of calculus? Can one find a primitive?
\end{enumerate}
The equations we will address are systems of non-linear equations in finitely many variables and also ordinary differential equations.
The approach will be mostly theoretical, schetching a framework in which one can predict how many solutions there will be without necessarily solving the equation.
The key assumption is that everything we do can locally be approximated by linear functions. In other words, everything will be differentiable.
One of the main results is that the linearization of the equation predicts the number of solutions and approximates them well locally. This is known as the implicit function theorem. For ordinary differential equations we will prove a similar result on the existence and uniqueness of solutions.
To introduce the second question, recall what the fundamental theorem of calculus says.
\[\int_{a}^b f'(x)dx = f(b)-f(a)\]
What if $f$ is now a function depending on two or more variables? In two and three dimensions, vector calculus gives some partial answers involving div, grad, curl and the theorems of Gauss, Green and Stokes. How can one make sense of these and are there any more such theorems perhaps in higher dimensions?
The key to understanding this question is to pass from functions to differential forms. In the example above this means passing from $f(x)$ to the differential form $f(x)dx$. Taking the $dx$ part of our integrands seriously clarifies all formulas and shows the way to a general fundamental theorem of calculus that works in any dimension, known as the (generalized) Stokes theorem:
\[
\int_\Omega d\omega = \int_{\p \Omega} \omega
\]
All the results mentioned in this paragraph are special cases of this powerful theorem.
{\bf This is not calculus. }
We made an attempt to prove everything we say so that no black boxes have to be accepted on faith.
This self-sufficiency is one of the great strengths of mathematics.
The reader is asked to at least try the exercises.
Doing exercises (and necessarily failing some!) is an integral part of mathematics.
\section{Basic notions and notation}
Most of the material is standard and can be found in references such as \emph{Calculus on manifolds} by M. Spivak.
We will mostly work in $\R^n$ whose elements are vectors $\R^n\ni x = (x^1,\dots, x^n) = \sum_{i=1}^n x^i e_i$ where $e_i$ is the $i$-th standard basis vector
whose coordinates are all $0$ except a $1$ at the $i$-th place.
Throughout the text I try to write functions as $A\ni a \xmapsto{f} a^2+a+1 \in B$, instead of $f:A\to B$ defined by $f(a) = a^2+a+1$.
An important function is the (Euclidean) norm $\R^n \ni x \xmapsto{||}\R$ where $|(x_1,x_2,\dots,x_n)| = \sqrt{x_1^2+x_2^2+\dots x_n^2}$.
As the name suggests it satisfies the triangle inequality $|x+y|\leq |x|+|y|$.
Another piece of notation is that of open and closed subsets of $\R^n$. A set $S\subset \R^n$ is called \emph{open} if it is the union of (possibly infinitely many) open balls $B_r(p) = \{x\ni \R^n:|x-p|0$ there is a $\delta>0$ such that for all $u,v\in U$ $|u-v|<\delta \Rightarrow |f(u)-f(v)|<\epsilon$.
A continuous function on a closed bounded $U\subset \R^n$ is uniformly continuous. Moreover, continuous functions $\R^m\to\R^n$ send closed bounded subsets to closed bounded subsets.
For more on open sets and continuity we refer to the course on metric and topological spaces next block. The above results and definitions should all be familliar at least in the case $n=1$. When formulated in terms of the norm all proofs are identical in the case of $\R^n$.
\subsection*{Exercises}
\noindent{\bf Exercise 0.}\\
True or false:
\begin{enumerate}
\item $B_1(1)\cup B_\pi(5)$ is open.
\item $[0,1]^{10}$ is closed.
\item $[0,1)$ is open.
\item $\{(x_1,\dots x_n) \in \R^n| x_1>0 \}$ is open.
\item $\{x\in \R^7: x_1^2+\dots+ x_7^2=1\}$ is open.
\end{enumerate}
\chapter{How to solve equations}
\label{ch.eqns}
Under what conditions can a system of $n$ real equations in $k+n$ variables be solved?
Naively one may hope that each equation can be used to determine a variable so that in the end $k$ variables are left undetermined
and all others are functions of those. For example consider the two systems of two equations on the left and on the right ($k=1,n=2$):
\begin{align}
x+y+z&=0 \qquad &\sin(x+y)-\log(1-z) &= 0 \\
-x+y+z&=0 \qquad &e^{y}-\frac{1}{1+x-z}&= 0
\end{align}
\begin{figure}[htp!]
\begin{center}
\includegraphics[width=6cm]{Pictures/Implicit1Recht.png}
\includegraphics[width=6cm]{Pictures/Implicit1Krom.png}
\caption{Solutions to the two systems. The yellow surface is the solution to the first equation, blue the second.
The positive $x,y,z$ axes are drawn in red, green, blue respectively.}
\end{center}
\end{figure}
The system on the left is linear and easy to solve, we get $x=0$ and $y=-z$.
The system on the right is hard to solve explicitly but looks very similar near $(0,0,0)$ since $\sin(x+y)\approx x+y$ and $\log(1-z)\approx -z$ near zero.
We will be able to show that just like in the linear situation, a curve of solutions passes through the origin.
The key point is that the derivative of the complicated looking functions at the origin is precisely the
linear function shown on the left.
We will look at equations involving only differentiable functions. This means that locally they can
be approximated well by linear functions. The goal of the chapter is to prove the implicit function theorem.
Basically it says that the linear approximation decides whether or not a system of equations is solvable locally and if so how many solutions it has.
This is illustrated in the figures above.
\begin{comment}Even the solution set to single equation in three unknowns can take many forms. See for example figure \ref{fig.randomLevSets} where we generated random polynomial equations and plotted the solution set.
\begin{figure}
\includegraphics[width=\textwidth]{Pictures/Collage.png}
\caption{Some random level sets.}
\label{fig.randomLevSets}
\end{figure}
\end{comment}
\subsection*{Exercises}
\noindent{\bf Exercise 0.}\\
The non-linear equation $\sin(x-y) = 0$ has a solution $(x,y) = (0,0)$. Find a linear equation that approximates the non-linear equation well near $(0,0)$.\\
\noindent{\bf Exercise 1.} (Linear case)\\
Is it always true that a system of two linear equations in three unknowns has a line of solutions? Prove or provide counter example.\\
\noindent{\bf Exercise 2.} (Non-linear equation)\\
Solve $z$ and $y$ as a function of $x$ subject to the equations.
\begin{align*}
x^2+y^2+z^2&=1 \\
x+y^2 + z^3&=0
\end{align*}
\noindent{\bf Exercise 3.} (Linear equation)\\
Write the system below as a matrix equation $Ax=b$ for matrix $A$ and vectors $x,b$.
\begin{align*}
x+3y+2z&=1 \\
x+y+ z&=0 \\
-x+y+4z&=0
\end{align*}
\noindent{\bf Exercise 4.} (Three holes)\\
Give a single equation in three unknowns such that the solution set is a bounded subset of $\R^3$, looks smooth and two-dimensional everywhere and
has a hole. Harder: Can you increase the number of holes to three?
\section{Linear algebra}
The basis for our investigation of equations is the linear case. Linear equations can neatly be summarized in terms of a single matrix equation
$Av = b$. Here $v$ is a vector in $\R^{k+n}$, and $b\in \R^n$ and $A$ is an $n\times (k+n)$ matrix.
In case $b=0$ we call the equation homogeneous and the solution set is some linear subspace $\ker A = \{v\in \R^{k+n}|Av = 0\}$, the kernel of the map defined by $A$.
In general, given a single solution $p\in \R^{k+n}$ such that $Ap = b$ the entire solution set $\{v\in \R^{k+n}|Av=b\}$ is the affine linear subspace
$(\ker A)+p = \{s+p|s\in \ker A\}$.
In discussing the qualitative properties of linear equations it is more convenient to think in terms of linear maps. Most of this material should be familiar from
linear algebra courses but we give a few pointers here to establish notation and emphasize the important points. With some irony, the first and second rules of linear algebra are:
\begin{enumerate}
\item YOU DO NOT PICK A BASIS
\item IF YOU PICK A BASIS BE READY TO CHANGE IT
\end{enumerate}
In this section $W,V$ will always be real vector spaces of finite dimensions $m$ and $n$. A \emph{basis} for $V$ is an ordered set of linearly independent vectors $b_1,\dots,b_n$ that span the whole space. The dimension $\dim V$ is the number of basis elements one needs and does not depend on the basis chosen. The standard basis of $\R^n$ is denoted by $e_1,\dots e_n$. Once we choose a basis, $b_1,\dots b_n$ for $V$ we get a linear bijection between $V$ and $\R^n$ sending $b_i$ to $e_i$. Vectors $v\in V$ are written $v = \sum_i v^i b_i$ for some coefficients $w_i\in \R$. Notice we use upper indices (not powers!)
for the coefficients $v^i$ of a vector $v\in V$. While powerful and concrete choosing a basis is dangerous because it tends to destroy a lot of symmetry and structure. One basis may be natural for some purpose while another basis may be more approriate for doing another task.
The set of all linear maps from $V$ to $W$ is denoted $L(V,W)$.
If we would set $V = \R^n$ and $W = \R^m$ then $\varphi\in L(V,W)$ could be described by a matrix $(\varphi^i_j)$ defined by $\varphi e_j = \sum_i \varphi^i_j e_i$. In our notation, upper indices indicate rows and lower indices are columns and the columns of the matrix are the images of the basis vectors.
As we said, the matrix of $\varphi$ might look easier with respect to another basis so we prefer to keep the $V,W$ abstract and express the linear map $\varphi$
with respect to bases $b_1,\dots b_m$ of $W$ and $c_1,\dots c_n$ of $V$ as $(\varphi^i_j)$ given by $\varphi c_j = \sum_i \varphi^i_j b_i$.
The dual space $V^* = L(V,\R)$ becomes a vector space in its own right when we define addition and scalar multiplication pointwise:
$(af+g)(v) = af(v)+g(v)$ for any $f,g\in V^*$ and $v\in V$ and $a\in \R$. A basis $b_1,\dots b_n$ of $V$ yields a dual basis $b^1,\dots b^n$ of $V^*$ called the \emph{dual basis} by requiring $b^i(b_j) = \delta^i_j$. Here $\delta^i_j = \begin{cases} 1 \text{\ if $i=j$}\\ 0 \text{\ if $i\neq j$}\end{cases}$ is the Kronecker delta. Elements $f\in V^*$ are thought of as row vectors and expressed in the dual basis as $f = \sum_i f_ib^i$.
A useful feature of the dual space is the \emph{pull-back} (also known as transpose). Given $f\in L(V,W)$ the pull-back is $f^*\in L(W^*,V^*)$ defined
by $f^*\varphi = \varphi\circ f$.
Finally the (Euclidean) \emph{norm} of $\varphi \in L(\R^n,\R^m)$ is defined as $|\varphi| = \max_{v\in S^n} |\varphi(v)|$, where $S^n = \{v\in \R^n: |v|=1\}$. It satisfies
$|\phi(v)|\leq |\phi||v|$.
\begin{comment}
To better understand determinants we use Gaussian elimination to compute them. This can be implemented by the elementary operations
$R^c_{ij}:\R^n\to \R^n$ defined by $R^c_{ij} = I+ce_ie^j$ for $i\neq j$. We have $\det R^c_{ij} = 1$ and for square matrix $A$ the matrix
$AR^c_{ij}$ is the result of adding $c$ times column $i$ to column $j$. Likewise $R^c_{ij}A$ is the result of adding $c$ times row $i$ to row $j$.
Using these operations it is possible to interchange two rows or columns (exercise!).
\begin{lemma}(Gaussian elimination)\\
\label{lem.DoubleGElim}
Any $n\times n$ matrix $A$ can be written as a product $A = ED\tilde{E}$ where $D$ is diagonal and $E,\tilde{E}$ are products of $R^c_{ij}$.
In particular $\det A = \det D$.
\end{lemma}
\begin{proof}
Induction on the size $n$. For $n=1$ this is clear. For the induction step we consider an $n\times n$ matrix $A$. Unless all entries in the final row and column
are zero we may use elementary operations to make $A^n_n \neq 0$. In either case we can use more elementary operations to make all off-diagonal entries in the final row and column equal to $0$. By induction we can do the same for the $(n-1)\times (n-1)$ block.
\end{proof}
Just like one can compute with integers modulo $n$ one can also compute with vectors modulo some subspace. Given subspace $U\subset V$ this means
that we compute with equivalence classes $v = v' \mod U$ if $v-v'\in U$. The result is again a vector space called the quotient vector space $V/U$.
\end{comment}
\subsection*{Exercises}
{\bf Exercise 0.}\\
Why is our definition of basis as a linear isomorphism $\R^n \xrightarrow{b} V$ consistent with the usual notion of a basis as an (ordered) set of vectors $b_1,b_2\dots b_n \in V$ that is both linearly independent and spans $V$? What is the linear map in $L(\R^n,\R^n)$ that describes the standard basis $e_1,e_2 \dots e_n$ of $\R^n$?\\
\noindent{\bf Exercise 1.}\\
Set $V = \R^2$ and $W=\R^3$ and define a linear map $\varphi\in L(V,W)$ by $\varphi e_1 = \varphi e_2 = e_3$. What is the matrix of $\varphi$ with respect to the standard bases of $V$ and $W$? What is the matrix of $\varphi$ with respect to the bases $b_1 = {1 \choose 0}$ and $b_2 = {1 \choose 1}$ of $V$ and $c_1 = e_1+e_3$ and $c_2 = e_3+e_3$ and $c_3 = e_3$ of $W$?
\\
\noindent{\bf Exercise 2.}\\
The vector space $\Lambda^2 \R^3$ is the vector space spanned by things of the form $v\wedge w$ where $\wedge$ satisfies the following rules:
$v\wedge w = -w\wedge v$ and $(\alpha u+\beta v)\wedge w = \alpha u\wedge w+\beta v\wedge w$. A basis for $\Lambda^2\R^3$ is given by the vectors $f_1 = e_1\wedge e_2$, $f_2= e_2\wedge e_3$, $f_3 = e_3\wedge e_1$. Show that if $v\times w = \sum_i c^i e_i$ then $v\wedge w = \sum_i c^i f_i$.\\
\noindent{\bf Exercise 3.}\\
The set $P_n$ of polynomials of degree $\leq n$ in one variable $x$ with real coefficients is a vector space with respect to the usual addition and multiplication by scalars. Give a basis of $P_3$ and express the linear map $P_3\xrightarrow{\frac{d}{dx}}P_3$ in your basis.\\
\noindent{\bf Exercise 4.}\\
Find a linear map from $V$ to $V^{**}$ without choosing a basis.\\
\noindent{\bf Exercise 5.}\\
Show that $(g\circ f)^* = f^*\circ g^*$.\\
\noindent{\bf Exercise 6.}\\
How many solutions does a system of two equations in 4 unknowns generally have? What is the dimension of the solution space?\\
\noindent{\bf Exercise 7.}\\
Prove that any linear map $\varphi\in L(\R^n,\R^m)$ is continuous. Also show that the image of $\varphi{S^n}$ is closed and bounded. (Hint: use a property of continuous functions mentioned in the introduction).\\
\noindent{\bf Exercise 8.}\\
Let $b_1,b_2,\dots, b_n$ be a basis for $V$. Check that the dual basis $b^1, \dots, b^n$ is a basis for $V^*$. Show that $\dim V = \dim V^*$\\
\noindent{\bf Exercise 9.}\\
The set $L(V,W)$ is a vector space when we define addition and scalar multiplication pointwise: $(af+g)(v) = af(v)+g(v)$. Find a basis for $L(\R^2,\R^3)$ and compute $\dim L(\R^2,\R^3)$. Same questions for $L(V,W)$ where $V,W$ are arbitrary vector spaces.\\
\noindent{\bf Exercise 10.}\\
In this exercise we identify the complex numbers $\C$ with $\R^2$ via $x+iy\leftrightarrow (x,y)$. Show that the map $\C \ni z \mapsto (a+bi)z$ corresponds to an element of $L(\R^2,\R^2)$ whose matrix with respect to the standard bases is $\left(\begin{array}{cc}a & -b\\ b& a \end{array}\right)$.\\
\noindent{\bf Exercise 11.}\\
Consider the linear map $L\in L(\R^3,\R^2)$ defined by $Lu = e_1$ and $Lv=e_2$ and $Lw = 0$.\\
Suppose $u = e_1+2e_2+3e_3$ and $v = e_1-e_2-4e_3$ and $w = e_1-e_2-e_3$. Do $u,v,w$ form a basis of $\R^3$?\\
Write down the matrix of $L$ with respect to $u,v,w$ and the standard basis in $\R^2$. Also write down the matrix of $L$ with
respect to the standard bases on both sides.\\
\section{Derivative}
Now that we understand linear functions, we would like to use this to study more general functions
$\R^m \supset P \xrightarrow{f} \R^n$, where unless stated otherwise $P$ is always a non-empty open subset of $\R^m$.
The key idea is to locally approximate non-linear objects by linear ones. In this case at every point $p\in P$ we are looking
for the linear map $f'(p) \in L(\R^m,\R^n)$ best approximating $f$ close to $p$. This is just the first order Taylor approximation to $f$ at $p$.
Since we are approximating, some specialized notation is useful. For functions $\R^m \xrightarrow{f,g} \R^n$ we define
$f = o(g)$ to mean $\lim_{h\to 0}\frac{|f(h)|}{|g(h)|} =0$, intuitively $f$ goes to zero faster than $g$ as $h$ goes to $0$.
For example $h^2 = o(h)$. We often use the triangle inequality to show that $f = o(h)$ and $g=o(h)$ implies $f+g = o(h)$ (Exercise!).
Although our notation may be a little unfamiliar, the picture is just like in one variable, see figure \ref{fig.Derivative}.
\begin{figure}[htp!]
\begin{center}
\includegraphics[width=6cm]{Pictures/Derivative.png}
\end{center}
\caption{The derivative $D$ at $p$ is the linear map that best approximates to $f$ at point $p$.}
\label{fig.Derivative}
\end{figure}
\begin{definition}{\bf(Differentiability)}\\
A map $\R^m \supset P \xrightarrow{f} \R^n$ is called differentiable at $p\in P$ if there exists a linear map
$D \in L(\R^m,\R^n)$ such that $f(p+h) - f(p)-Dh = o(h)$.
When $f$ is differentiable for all $p\in P$ we say $f$ is differentiable.
\end{definition}
To see the relation with the Taylor expansion we set $\epsilon_{f,D,p}(h) = f(p+h) - f(p)-Dh$ and write
\begin{equation}
\label{fprime}
f(p+h) = f(p)+Dh + \epsilon_{f,D,p}(h)
\end{equation}
The function $\epsilon$ represents the error in the first order Taylor approximation and differentiability means that the error is $o(h)$ as $h$ goes to $0$, so $\lim_{h\to 0}\frac{|\epsilon_{f,D,p}(h)|}{|h|}= 0$.
We start with a $1$-dimensional example: $\R \ni x \xmapsto{f} x^3 \in \R$ and $p = 1$. We know we should have $f'(p) = 3p^2 = 3$ but notice our notion of derivarive should be a linear map, not a number. Take $D\in L(\R,\R)$ to be multiplication by $3$, so $De_1 = 3 e_1$. This works because
$f(p+h) - f(p)-Dh = (1+h)^3-1-3h = 3h^2+h^3 = o(h)$.
For a higher dimensional example take $\R^2 \ni (x,y) \xmapsto{f} x^2-y^2 \in \R$ and $p=(0,1)$. In this case we may take $D\in L(\R^2,\R)$ to be given by the matrix
$(0, -2)$ with respect to the standard bases. To see that this works we set $h=(k,\ell)$ and show that
the error $\epsilon_{f,D,p}(h) = f(p+h)-f(p)-f'(p)(h)$ goes to zero faster than $h$ does:
\[
\epsilon_{f,D,p}(h)= f(k,\ell+1)-f(0,1)-2\ell = k^2-(\ell+1)^2+1+2\ell = k^2-\ell^2
\]
So as promised $\frac{|\epsilon_{f,D,p}(h)|}{|h|} = \frac{|k^2-\ell^2|}{|\sqrt{k^2+\ell^2}|} \leq
\frac{|k^2+\ell^2|}{\sqrt{k^2+\ell^2}} = |h|$. Taking the limit $h \to 0$ shows $D$ satisfies equation \eqref{fprime}.
Provided it exists, the linear approximation $D$ above is actually unique. It therefore deserves a special name, the derivative of
$f$ at $p$, notation: $f'(p)$.
\begin{definition}{\bf(Derivative)}\\
If $f$ is differentiable at $p$ then the derivative of $f$ at $p$ called $f'(p)\in L(\R^m,\R^n)$ is the {\bf unique} linear map satisfying \eqref{fprime}.
\end{definition}
\begin{proof} (Of uniqueness). Suppose we have another $A\in L(\R^m,\R^n)$ also satisfies \eqref{fprime}. Subtracting these two equations gives
$(D-A)h = \epsilon_{f,A,p}(h)-\epsilon_{f,D,p}(h) = o(h)$. Therefore for any non-zero vector $w\in \R^m$ we have
$\frac{1}{|w|}|(D-A)w|= \lim_{n\to \infty}\frac{|(D-A)\frac{w}{n}|}{\frac{|w|}{n}} = \lim_{h\to 0}\frac{|(D-A)h|}{|h|} = 0$ so that $Dw=Aw$. Since $w$ was arbitrary $A=D$.
\end{proof}
For functions $\R\xrightarrow{f} \R$ our definition of derivative $f'(p)$ is just a complicated reformulation of the usual definition.
Actually the matrix of the derivative with respect to the standard bases is just the matrix of partial derivatives. In the above example the linear map $D$ is just
$(\frac{\p f}{\p x}(p),\frac{\p f}{\p y}(p)) =(0,-2)$. This and much more will follow from the next theorem.
\begin{theorem} {\bf(Properties of derivative)}
\label{thm.propertiesofD}
Imagine a function $\R^k \supset Q\xrightarrow{f} P\subset \R^\ell$.
\begin{enumerate}
\item (Chain-rule). If $f$ is differentiable at $q\in Q$ and $\R^\ell \supset P\xrightarrow{g} \R^m$ is differentiable at $f(q)\in P$
we have $(g\circ f)'(q) = g'(f(q))f'(q)$.
\item If $f$ is constant then $f$ is differentiable and $f'(q) = 0$ for all $q\in Q$.
\item If $f\in L(\R^k,\R^\ell)$ then $f'(q) = f$ for all $q\in \R^k$.
\item The function $f = (f^1,f^2,\dots,f^\ell) = \sum_i f^i e_i$ is differentiable at $q$ if and only if the component functions
$P\xrightarrow{f^i} \R$ are. If so, $f'(q)(v) = ((f^1)'(q)(v),\dots, (f^\ell)'(q)(v)) = \sum_i (f^i)'(q)(v) e_i$.
\item The product $\R^2 \ni (x,y) \xmapsto{\times}\in\R$ is a differentiable function with $\times'(x,y)(k,\ell) = yk+x\ell$.
\end{enumerate}
\end{theorem}
\begin{proof}
Part 1 (Chain rule). Set $p = f(q)$. For the chain rule it suffices to show that the linear map $g'(p)f'(q) \in L(\R^\ell,\R^m)$ satisfies equation \eqref{fprime}.
We know that $f(q+h) = p+f'(q)h+\epsilon_{f,q}(h)$ and $g(p+k) = g(p)+g'(p)k+\epsilon_{g,p}(k)$. Combining those we
can approximate $(g\circ f)(q+h) = $
\[g(p+f'(q)h+\epsilon_{f,q}(h)) = g(p)+g'(p)k+ \epsilon_{g,p}(k) = g(p)+ g'(p)f'(q)h + \epsilon_{(g\circ f),q}(h)\]
where we set $k = f'(q)h+\epsilon_{f,q}(h)$ and $\epsilon_{(g\circ f),q}(h) = g'(p)\epsilon_{f,q}(h)+\epsilon_{g,p}(k) = A+B$.
Now we need to show that $\epsilon_{(g\circ f),q}(h) = o(h)$ as $h\to 0$. In fact $A = o(h)$ and $B = o(h)$. For $A$ it follows from the differentiability of $f$ and continuity of the linear map $g'(p)$. For $B$ we use differentiability of $g$ to see that for any $\alpha>0$ we have $|\epsilon_{g,p}(k)|<\alpha k$ whenever $k$ is suitably small. So $\frac{1}{|h|}|\epsilon_{g,p}(k(h))|<\alpha \frac{1}{|h|}(f'(q)h+\epsilon_{f,q}(h))0$. Notice that for $m0$ such that for all $p\in R$ and all $|p-q|<\delta$ we have $|f(p)-f(q)|<\frac{\epsilon}{vol(R)}$.
Now for $n>m$ large enough such that $\sum_{i=1}^k\frac{|b^i-a^i|}{2^m}<\delta$ we find our Cauchy estimate:
\[
|I_{R,n}(f)-I_{R,m}(f)|\leq \frac{vol(R)}{2^{nk}}\sum_{j\in \{0,1,\dots, 2^n-1\}^k}\frac{\epsilon}{vol(R)} <\epsilon
\]
Part 2 follows from $I_{R,n}(f+\alpha g) =I_{R,n}(f)+\alpha I_{R,n}(g)$.\\
For part 3 we note that any continuous function attains its max and min on the closed and bounded set $R$. For any $n$ we have $I_{R,n}(f)\leq (\max_R f)I_{R,n}(1)$ and similarly for the minimum.
Part 4 follows from part 2: Set $R_n = p+[0,\frac{1}{n}]^k$. Then $\frac{1}{vol(R_n)}\int_{R_n} f\in [\min_{R_n}f,\max_{R_n}f]$.
Since $f$ is uniformly continuous on $R$, for any $\epsilon$ there is an $n$ such that $|f(q)-f(p)|< \epsilon$ for all $q\in R_n$, finishing the proof.
\end{proof}
The Fubini theorem about computing an integral by first integrating out a couple of variables is a simple matter in this framework.
\begin{lemma}{\bf(Fubini)}\\
For a continuous function $f$ defined on a rectangle $R\times S \subset \R^k\times \R^\ell$ we have
\[\int_{R\times S}f = \int_{R} F \quad\text{where}\quad F(p) = \int_{S} f(p,\cdot)\]
\end{lemma}
\begin{proof}
Assuming $R = \prod_{i=1}^k[a_i,b_i]$ and $S = \prod_{i=1}^\ell[c_i,d_i]$ we have
$F(p) = \lim_{m\to \infty} I_{S,m} F(p,\cdot)$ defines a continous function $F$ on $R$ (Exercise!).
\[I_{R,n}(F) = \lim_{m\to\infty}\frac{vol(R)}{2^{nk}}\sum_{j\in \{0,1,\dots, 2^n-1\}^k} I_{S,m}(f(a+\sum_{i=1}^n j^i\frac{b^i-a^i}{2^n}e_i,\cdot)) = \]
\[
\lim_{m\to\infty}\frac{vol(R\times S)}{2^{n(k+\ell)}}\sum_{j\in \{0,1,\dots, 2^n-1\}^k }\sum_{h\in \{0,1,\dots, 2^m-1\}^\ell }
f(a+\sum_{i=1}^n j^i\frac{b^i-a^i}{2^n}e_i,c+\sum_{i=1}^n h^i\frac{d^i-c^i}{2^m}e_i)\]
If we denote the last formula as $\lim_{m\to\infty}v_{m,n}$ then notice that $v_{n,n} = I_{R\times S,n}(f)$. It follows that
\[\int_R F = \lim_{n,m\to\infty}v_{m,n} = \lim_{n\to \infty}v_{n,n} = \int_{R\times S}f\]
\end{proof}
In one dimension the fundamental theorem of calculus is the following. One of the main aims of this course is to find a multivariable analogue.
In computations the following notation $f(b)-f(a) = f(x)|^{x=b}_{x=a}$ is often useful.
\begin{lemma}{\bf(Fundamental theorem of calculus)}\\
Suppose $f$ is $C^1$ on $[a,b]$. Then
\[ \int_{[a,b]} f' = f(b)-f(a) = f|^b_a
\]
The function $F(x) = \int_{[a,a+x]} f$ then is differentiable and $F'(x) = f(x)$.
\end{lemma}
\begin{proof}
\[
f(b)-f(a) = \sum_{j=0}^{2^n-1} f(a+(j+1)\frac{b-a}{2^n})-f(a+j\frac{b-a}{2^n}) = \sum_{j=0}^{2^n-1} f'(a+j\frac{b-a}{2^n})\frac{b-a}{2^n}+\epsilon_{f,a+j\frac{b-a}{2^n}}(\frac{b-a}{2^n}) = I_{b-a,n}(f')+E
\]
where $E = \sum_{j=0}^{2^n-1} \epsilon_{f,a+j\frac{b-a}{2^n}}(\frac{b-a}{2^n})$ converges to $0$ since $\epsilon(h) = o(h)$.
For the second equality use part 2 of Lemma \ref{lem.propofint} to get
\[F(x+h)-F(x) = \int_{[x,x+h]}f \in [h\min_{t\in [0,h]} f(x+t),h\max_{t\in [0,h]} f(x+t)] \]
Continuity of $f$ means that
$\lim_{h\to 0}\min_{t\in [0,h]} f(x+t) = f(x)$ and the same for the maximum. Dividing by $h$ and taking the limit on both sides finishes the proof.
\end{proof}
Taken together Fubini's theorem and the fundamental theorem of calculus allow us to integrate many functions on rectangles. For example let us compute $\int_R f$ where $R = [0,1]\times[2,3]\times[-1,1]$ and $R\ni (x,y,z) \xmapsto{f} xy+z^2\in \R$. First set $F(x,y) = \int_{[-1,1]}f(x,y,\cdot) = xy+\frac{z^3}{3}|^{z=1}_{z=-1} = xy+\frac{2}{3}$ then Fubini says $\int_R f = \int_{[0,1]\times [2,3]}F$. Again define $G(x) = \int_{[-1,1]}F(x,\cdot)= (\frac{xy^2}{2}+\frac{2}{3}y)|^{y=3}_{y=2} =
\frac{5x}{2}+\frac{2}{3}$. So finally $\int_R f = \int_{[0,1]}G = (\frac{5x^2}{4}+\frac{2}{3}x)|^1_0 = \frac{5}{4}+\frac{2}{3} = \frac{23}{12}$.
Fubini's theorem allows us to give a soft proof of the fact that mixed partial derivatives commute. This result will be very important later in discussing the exterior derivative. Recall that the partial derivative is $\p_i f(p) = f'(p)e_i$.
\begin{lemma}{\bf(Mixed partial derivatives commute)}\\
For any $C^2$ function $f$ we have $\p_i\p_j f = \p_j\p_i f$.
\label{lem.mixedpartials}
\end{lemma}
\begin{proof}
It suffices to prove the case of a function $f$ defined on an open subset of $\R^2$. This is because
$\p_i\p_j f(p) = \p_1\p_2 \tilde{f}(0,0)$ with $\tilde{f_p}(x,y) = f(p+xe_i+ye_j)$.
We will show that $I = \int_{[a,b]\times[c,d]} \p_1\p_2 f = \int_{[a,b]\times[c,d]} \p_2\p_1 f=J$.
Part 3 of Lemma \ref{lem.propofint} then implies $\p_2\p_1 f = \p_1\p_2 f$.
Using Fubini, $I = \int_{[a,b]} F$ where $F(p) = \int_{[c,d]}g'$
and $g(q) = \p_1f(p,q)$. By the fundamental theorem of calculus
$I = \int_{[a,b]} g(d)-g(c) = \int_{[a,b]} \p_1f(\cdot,d)-\p_1f(\cdot,c) = \int_{[a,b]}h'$
with $h(p) = f(p,d)-f(p,c)$. So we conclude that
$I = h(b)-h(a) = f(b,d)-f(b,c)-f(a,d)+f(a,c)$.
Splitting the integral in the other order and doing the same steps shows that $J$ gives the same answer.
\end{proof}
Yet another application of Fubini is to prove that one can differentiate under the integral sign:
\begin{lemma}{\bf(Differentiation under the integral sign)}\\
For any $C^1$ function $f$ defined on rectangle $[a,b]\times R$ we have $\p_1\int_R f = \int_R \p_1 f$.
\end{lemma}
\begin{proof} By part 3 of the properties of integration lemma, it suffices to prove that for all $[c,d]$ we have $\int_{[c,d]}\p_1\int_R f = \int_{[c,d]}\int_R \p_1 f$. Using the fundamental theorem of calculus the left hand side is equal to $\int_R{f(d,\cdot)-f(c,\cdot)}$. Fubini says
the right hand side is $\int_R \int_{[a,b]} \p_1 f = \int_R f(d,\cdot)-f(c,\cdot)$ finishing the proof.
\end{proof}
\subsection*{Exercises}
\noindent{\bf Exercise 0.}\\
Set $R = [1,2]\times [3,5]$ and $R\ni (x,y) \xmapsto{f} 2xy\in \R$. Compute the integral $\int_R f$ directly from the definition given in the text.\\
\noindent{\bf Exercise 1.}\\
Prove the change of variables theorem for a $C^1$ function $[a,b]\xrightarrow\varphi \R$ with $\varphi(a)<\varphi(b)$ by applying the fundamental theorem of calculus.
So given a continuous function $[\varphi(a),\varphi(b)]\xrightarrow{f}\R$ and $\forall x\in [a,b]:\ \varphi'(x)\geq 0$ show that:
\[
\int_{[a,b]}(f\circ \varphi)\varphi' = \int_{[\varphi(a),\varphi(b)]} f
\]
\noindent{\bf Exercise 2.}\\
Compute the integral $\int_R f$ for $R = [0,2]\times [0,3]$ and $R\ni (x,y) \xmapsto{f} x^2+y^2$ using Fubini's theorem and the fundamental theorem of Calculus.\\
\noindent{\bf Exercise 3.}\\
Compute the integral $\int_R f$ for $R = [0,2]\times [0,3]\times [-2,0]$ and $R\ni (x,y,z) \xmapsto{f} x^3+y^3+\cos(z)$ using Fubini's theorem and the fundamental theorem of Calculus.\\
\noindent{\bf Exercise 4.}\\
Compute the integral $\inf_R f$ for $R = [0,1]^n$ and $R\ni (x^1,x^2,\dots x^n) \xmapsto{f} x^1x^2\dots x^n$ (here $x^i$ means the $i$-th coordinate of $x$) using Fubini's theorem and the fundamental theorem of Calculus.\\
\noindent{\bf Exercise 5.}\\
Prove that the $F$ from the statement of Fubini's theorem is continuous.
\\
\section{Mean value theorem and Banach contraction}
\label{sec.medBanach}
In this section we prepare a few results necessary for proving the inverse and implicit function theorems of next section.
Recall the mean value theorem that says that if $f$ is a differentiable function on $[a,b]$ then there exists $c\in (a,b)$ such that $f'(c)(b-a) = f(b)-f(a)$.
This allows us to show differentiability of a function can be checked by looking at the partial derivatives.
\begin{lemma}{\bf ($C^1$ implies differentiable)}\\
\label{lem.C1Dif}
Suppose $\R^m\supset P\xrightarrow{f} \R^n$ is a $C^1$ function at $p\in P$, then $f'(p)$ exists and is determined by the partial derivatives:
$f'(p)e_i = \p_i(p)$, defined as in Definition \ref{def.directionalderivative}.
\end{lemma}
\begin{proof}
According to Theorem \ref{thm.propertiesofD} it suffices to treat the case $n=1$.
Writing $h = \sum_i h^i e_i$ and using the $1$-dimensional mean value theorem to the function $t\mapsto f(p+te_i)$
there is a $c_i\in (0,h^i)$ such that $h^i\p_i f(c_i) = f(q+h^ie_i)-f(q)$ for any $p\in P$. We compute
$f(p+h)-f(p) = \sum_{i=1}^m f(p+\sum_{j\leq i} h^je_j) - f(p+\sum_{j< i} h^je_j) =
\sum_{i=1}^m h^i\p_i f(c_i)$ with $c_i\in (0,h^i)$. Therefore the error satisfies
$\epsilon(h) = |f(p+h) -f(p)-\sum_i h^i \p_i f(p)| \leq \sum_{i=1}^m |h^i||\p_i f(c_i)-\p_i f(p)|$ with $c_i\in (0,h^i)$.
By continuity of the partial derivatives $\epsilon(h) = o(h)$.
\end{proof}
\begin{lemma}{\bf(Differential disk bound)}\\
\label{lem.mdisk}
Suppose $D \xrightarrow{F} \R^m$ is a differentiable map defined on a closed disk $D\subset\R^n$. The maximum $M = \max_{x\in D}|f'(x)|$ exists and is finite.
Moreover, for all $x,x+h\in D$ we have $|F(x+h)-F(x)|\leq M |h|$.
\end{lemma}
\begin{proof}
Since $|F'(x)|$ is continuous and $D$ closed and bounded it attains a maximum $M$. Fix a unit vector $u\in \R^m$.
The function $[0,1]\ni t \xmapsto{g} u\cdot F(x+t h) \in \R$ is differentiable with $g'(t) = u\cdot F'(x+th)h$.
The mean value theorem tells us there exists a $c\in (0,1)$ such that $g(1)-g(0) = g'(c)$ so
$u\cdot (F(x+h)-F(x)) = u\cdot F'(x+ch)h \leq M|h|$. Since this is true for any unit vector $u$ we must have $|F(x+h)-F(x)|\leq M|h|$.
\end{proof}
Finally to come up with solutions to equations we often use the following lemma.
\begin{lemma}{\bf(Banach contraction lemma)}
\label{lem.Banach}
\begin{enumerate}
\item Suppose $C\subset \R^n$ is a non-empty closed bounded subset and $\alpha\in [0,1)$. If $C\xrightarrow{\Phi} C$ is continuous and for all $x\neq y\in C$ we have $|\Phi(x)-\Phi(y)|<\alpha|x-y|$ then exists a unique fixed point $p\in C$ with $\Phi(p) = p$.
\item Suppose $\mathcal{C}$ is the set of continuous functions $\mathcal{C} = \{\gamma:[-\tau,\tau]\to D\}$, where $D\subset \R^n$ is a closed disk and $\alpha\in [0,1)$.
If $\Phi:\mathcal{C}\to\mathcal{C}$ is such that
$\sup_{|t|\leq \tau}|\Phi(\gamma(t))-\Phi(\delta(t))|<\alpha\sup_{|t|\leq \tau}|\gamma(t)-\delta(t)|$ then again $\Phi(\pi) = \pi$ for a unique $\pi\in \mathcal{C}$.
\end{enumerate}
\end{lemma}
Maps that satisfy the condition of the lemma are known as contractions. Each time we apply a contraction, the distance between points is multiplied by $\alpha<1$.
The proof of this important lemma and its generalizations will be treated in the class on metric and topological spaces.
\subsection*{Exercises}
{\bf Exercise 0.} (Multi-dimensional mean value theorem)\\
Prove the following generalization of the mean value theorem to multiple variables.
Suppose $Q \xrightarrow{F} \R$ is a differentiable map defined on an open subset $Q\subset\R^n$. If $Q$ contains the line segment between two points $a,b$ then
there exists a point $c$ on this segment such that: $F(b)-F(a) = F'(c)(b-a)$. Hint: use the one-dimensional mean value theorem on $F\circ \gamma$ for $\gamma$ a suitable curve.\\
\noindent{\bf Exercise 1.} (Mean failure)\\
Why is there no version of the mean value theorem for $\R^2\xrightarrow{F} \R^2$?
Give an example of a $C^2$ function $\R^2\xrightarrow{F}\R^2$ and $a\neq b\in \R^2$ such that there is no $c$ on the line segment between $a,b$ with the property that $F(b)-F(a) = F'(c)(b-a)$.\\
\noindent {\bf Exercise 2.} (Constant?)\\
Suppose $\R^m\supset R \xrightarrow{F} \R^n$ is a $C^1$ function defined on a rectangle $R$ that satisfies $F'(p)= 0$ for all $p\in R$.
Is it true that $F$ must be constant?\\
\noindent {\bf Exercise 3.} (Contractions)\\
We say $D\xrightarrow{f}D$ is a contraction if for all $x,y\in D$ we have $|f(x)-f(y)| < \alpha|x-y|$ for $\alpha\in [0,1)$ as in the Banach lemma.
Taking $D = [0,1]$, which of the following functions is a contraction? $f(x) = x^2$, $f(x) = x$, $f(x) = \frac{x}{2}$, $f(x) = \sin(\frac{\pi}{2} x)$, $f(x) = \frac{1-x}{1+x}$.\\
\noindent {\bf Exercise 4.} (Contractions 2)\\
Prove that if $p$ is a fixed point of the function $C\xrightarrow{\Phi} C$ satisfying the hypotheses of the Banach lemma, then for any $x\in C$
we must have the sequence $x,\Phi(x),(\Phi\circ \Phi)(x), (\Phi\circ \Phi \circ \Phi)(x),\dots$ converging to $p$.\\
\newpage
\section{Inverse and Implicit function theorems}
In this section we provide some answers to the first of the two main questions we posed at the beginning of these notes:
{\bf How many solutions does a system of equations have?} Intuitively a system of $n$ equations in $n$ unknowns should have only one or at most finitely many solutions. At least for linear equations $Ax =y$ this is true provided that the linear map $A$ is \emph{invertible} because the unique solution is then $x = A^{-1}y$.
Invertibility of a linear map is easy to check, it is equivalent to $\det A \neq 0$. Gaussian elimination will determine the determinant efficiently by bringing a matrix for $A$ into upper triangular form and multiplying the diagonal entries.
What about a system $f(x) = y$ of $n$ equations given by differentiable functions of $n$ unknowns?
We investigate the situation near a solution $f(x_0) = y_0$. The inverse function theorem says we can find a (local) inverse of $f$ provided $f'(x_0)$ is invertible.
Applying the inverse gives the unique solution in the form $x = f^{-1}(y)$ just like for the matrices.
Intuitively what is going on here is that we approximate $f$ near $x_0$ as $f(x_0+h) \approx f(x_0)+f'(x_0)h$. If we set $x = x_0+h$ we can solve for $x$ in $f(x) = y$ to get $x \approx x_0 + f'(x_0)^{-1}(y-y_0)$.
For example take $\R \ni x \xmapsto{f} x^2 \in \R$. Close to any point $p \neq 0$ we have $f'(p) \neq 0$ so the theorem says there is a \emph{unique} solution $x$ to $x^2 = y$ with the property that $x$ is close to $p$.
\begin{figure}[htpb!]
\begin{center}
\includegraphics[width=8cm]{Pictures/InverseFunctionTheorem.png}
\end{center}
\caption{The inverse function says that if $f'(x_0)$ is invertible, so is $f$ close to $x_0$. Close means there exists $X\times Y \ni (x_0,y_0)$ (green) on which the graph of $f$ (blue) coincides with the graph of $Y\xrightarrow{g} X$ shown in red.}
\label{fig.Inverse}
\end{figure}
\begin{theorem} {\bf (Inverse function theorem)}\\
\label{thm.ift}
Imagine a $C^1$ function between on an open set $\R^n \supset U \xrightarrow{f} \R^n$ with $f(x_0)=y_0$. If $f'(x_0)$ is invertible, then there are open sets
$x_0\in X\subset U$ and $y_0\in Y\subset \R^n$ and a $C^1$ function $Y \xrightarrow{g} X$ such that $f\circ g = id_Y$ and $g\circ f = id_{X}$.
Also $g'(y) = f'(g(y))^{-1}$.
\end{theorem}
\begin{proof}
Without loss of generality we may assume that $x_0=y_0=0$ and $f'(0) = Id_{\R^n}$.
Since $f'$ is continuous we may choose $\delta>0$ such that the closed disk $D=\overline{B}_\delta(0)\subset U$ and for all $x\in B_\delta(0)$ we have $|f'(x)-Id_{\R^n}|<\frac{1}{2}$.
Set $w(x) = f(x)-x$ so $w'(x) = f'(x)-Id_{\R^n}$ and by Lemma \ref{lem.mdisk} $|w(x+h)-w(x)| \leq \frac{1}{2}|h|$. In other words
\begin{equation}
\label{eq.ift1}|f(x+h)-f(x)-h|\leq \frac{|h|}{2} \qquad \forall x,x+h\in B_\delta(0)\end{equation}
Take $Y= B_{\frac{\delta}{2}}(0)$. Using Banach's contraction Lemma \ref{lem.Banach} we will show that for any $y\in Y$ there exists a unique $x\in D$ such that $f(x) = y$. This defines a function $Y\xrightarrow{g} D$ by setting $x = g(y)$. To this end define for any fixed $y\in Y$ the function
$D\ni z \xmapsto{\Phi} z+y-f(z)\in \R^n$. If there is an $x\in D$ with $\Phi(x) = x$ then this will be a solution to $f(x) = y$ and vice versa. To get this fixed point $x$ we check the hypotheses of Banach's lemma.
First the image of $\Phi$ is contained in $D$ because
setting $x=0$ in \eqref{eq.ift1} gives $|f(h)-h|\leq \frac{1}{2}|h|\leq \frac{1}{2}\delta$ for any $h\in D$ and replacing $h$ by $z$ in combination with the triangle inequality yields $|\Phi(z)| = |y-f(z)+z| \leq |y|+|f(z)-z|< \delta$.
Second we show that $\Phi$ is a contraction. Using \eqref{eq.ift1} we find $|\Phi(z+h)-\Phi(z)| = |-f(z+h)+f(z)+h| \leq \frac{|h|}{2}$ as required. The resulting fixed point $x$ satisfies $x = \Phi(x)$ is in fact in the open ball $X=B_\delta(0)$ as we saw above. We thus found a function $Y\xrightarrow{g} X$ such that $g(f(x)) = x$ for all $x\in X$ (by uniqueness of the fixed point). For the same reason $f(g(y)) = y$ for all $y\in Y$.
To see that $g$ is continuous take any two points $y,y+k \in Y$ and set $x = g(y), x+h = g(y+k)$ for some $h$. Equation \eqref{eq.ift1} tells us $|k-h|<\frac{1}{2}|h|$ and the triangle inequality says $|h|\leq |h-k|+|k|\leq \frac{1}{2}|h|+|k|$ so $|h|\leq 2|k|$. This shows continuity of $g$ at $y$ (why?).
Finally to show that $g$ is differentiable recall that $f$ is, so
$f(x+h) = f(x)+f'(x)h+\epsilon_{f,x}(h)$ with $\epsilon_{f,x} = o(h)$. In the previous notation with $A = f'(x)$ this means
$k = A(g(y+k)-g(y))+\epsilon_{f,x}(h)$. Now $|Ax-x|\leq |A-Id||x|\leq \frac{1}{2}|x|$ by definition of $D$ above. It follows that $\ker A = \{0\}$ so $A$ is invertible. Applying $A^{-1}$ to the above equation yields $g(y+k) = g(y) + A^{-1}k +\epsilon_{g,y}(k)$ where $\epsilon_{g,y}(k) = - A^{-1}\epsilon_{f,x}(h)$. This means that $g$ is differentiable with derivative $g'(y) = A^{-1}$ because the estimate below shows that $\epsilon_{g,y} = o(k)$ as $k\to 0$:
$\frac{\epsilon_{g,y}(k)|}{|k|} = \frac{|A^{-1}\epsilon_{f,x}(h)|}{|k|} \leq |A^{-1}|\frac{|\epsilon_{f,x}(h)||h|}{|h||k|}
\leq 2|A^{-1}|\frac{|\epsilon_{f,x}(h)|}{|h|}$.
Finally the derivative $g'(y) = f'(g(y))^{-1}$ is continuous since matrix inversion is continuous and so are $g$ and $f'$.
\end{proof}
Now that we know something about systems with as many equations as there are variables, what about if there are more variables than equations?
Again the linear case decides what happens in the non-linear case, at least locally. If we have $n+m$ variables and $m$ equations, can we select $n$ free variables and parametrize the solutions set in terms of those? The implicit function theorem says that locally the solution set is the graph of a function as shown in Figure \ref{fig.Implicit}.
\begin{figure}[htp!]
\begin{center}
\includegraphics[width=8cm]{Pictures/ImplicitFunctionTheorem0.png}
\end{center}
\caption{The implicit function says that the solutions to the equation $f(x,y)=z_0$ (shown in red) near the solution $(x_0,y_0)$, (i.e. in the green box $N\times M$) look like the graph of a function.}
\label{fig.Implicit}
\end{figure}
\begin{theorem}{\bf (Implicit function theorem)}\\
Imagine a $C^1$ function $\R^n\times \R^m \supset U\xrightarrow{f} \R^m$ and $(x_0,y_0)\in U$ and set $z_0 = f(x_0,y_0)$.
If $F'(y_0)$ is invertible, where $F(y) = f(x_0,y)$, then there exist open sets $N,M$ such that $(x_0,y_0)\in N\times M\subset U$ and a unique $C^1$ function $N\xrightarrow{g} M$ such that
\[(N\times M) \cap f^{-1}({z_0}) = \{(x,g(x))|x\in N\}\]
\end{theorem}
For example take $n=1,m=2$ and $f(x,y,z) = (x^2+y^2+z^2,\frac{x^3}{3}+y^2-z^2)$ and $(x_0,y_0)=(0,(1,1))$ and $z_0 = (2,0)$. Then the solution set $f^{-1}(\{z_0\})$ is the green tennisball curve which is the intersection of the sphere with radius $\sqrt{2}$ and the surface shown in yellow in Figure \ref{fig.tennis} below.
Adding the two equations we see $x^2(1+ \frac{x}{3})+2y^2=2$ so $y=\pm\sqrt{1-\frac{x^2}{2}-\frac{x^3}{6}}$ and taking the difference tells us $z$, so in this case we can actually find $g$ explicitly: $g(x) = (\sqrt{1-\frac{x^2}{2}-\frac{x^3}{6}},\sqrt{1-\frac{x^2}{2}+\frac{x^3}{6}})$. In general we will not be so lucky and $g$ is only defined \emph{implicitly}. The uniqueness of $g$ is assured by choosing the signs of the square roots to be positive, which is valid in $N\times M = [0,\infty)^3$. The function $F$ is in this case $F(y,z) = f(0,y,z) = (y^2+z^2,y^2-z^2)$ and $F'((1,1))$ has matrix $\left(\begin{array}{cc}2&2\\1&-2 \end{array}\right)$ and non-zero determinant.
\begin{figure}[htp!]
\begin{center}
\includegraphics[width=8cm]{Pictures/Tennis.png}
\end{center}
\caption{The solution to $f=(2,0)$ is shown in green, while the two level sets of $f^1$ and $f^2$ are the sphere and the yellow surface respectively. The linear approximation to the solution set around point $(0,1,1)$ is also shown.}
\label{fig.tennis}
\end{figure}
\begin{proof}
Define $\R^{n+m}\ni (x,y) \xmapsto{G} (x,f(x,y))\in \R^{n+m}$. Then $G$ is $C^1$ differentiable and the matrix for $G'(x_0,y_0)$ wrt the standard basis has the block form
$\left(\begin{array}{cc} Id & 0 \\ Z & F'(y_0) \end{array}\right)$. This means $G'(x_0,y_0)$ is invertible so by the inverse function theorem there exist open subsets $R_1,S_1\subset \R^n$ and $R_2,S_2\subset \R^m$ and $C^1$-function $S\xrightarrow{H} R$ such that $H\circ G = id_S$ and $G \circ H = id_R$.
Here we set $S=S_1\times S_2$ and $R = R_1\times R_2$.
$H$ has to be of the form $H(x,y) = (x,h(x,y))$ for some $C^1$-function $S\xrightarrow{h} R_1$ because otherwise composing with $G$ cannot give the identity.
Also $f(x,h(x,y)) = f(H(x,y)) = \pi_2\circ G\circ H(x,y) = y$, where $\pi_2$ means projection on the second coordinate. It follows that $f(x,h(x,z_0)) = z_0$ so we may take $M=S_1$ and $N=R_1$ and define $M\ni x\xmapsto{g} h(x,z_0)\in N$. The function $g$ is $C^1$ by the chain rule.
\end{proof}
\newpage
\subsection*{Exercises}
{\bf Exercise 0.} (Loss of generality)\\
Prove that the inverse function theorem in the special case where $x_0=y_0=0$ and $f'(x_0) = Id_{\R^n}$ implies the theorem in general.\\
{\footnotesize Hint: $F(x) = f'(x_0)^{-1}(f(x+x_0)-y_0)$}\\
\noindent {\bf Exercise 1.} (More loss of generality)\\
Prove that the implicit function theorem in the special case where $x_0=y_0=z_0=0$ and $F'(y_0) = Id_{\R^n}$ implies the theorem in general.\\
\noindent {\bf Exercise 2.} (Inverse from implicit)\\
Derive the inverse function theorem from the implicit function theorem. \\
{\footnotesize Hint: Apply implicit function theorem to the function $B(y,x) = f(x)-y$.}\\
\noindent {\bf Exercise 3.} (Derivative of implicit function)\\
Use the chain rule to find an expression for the derivative of the function $g$ in the implicit function theorem.\\
\noindent {\bf Exercise 4.} (Scary system?)\\
Consider the two equations in the four unknowns $a,b,c,d$ with parameters $z_1,z_2$ given by
\begin{align*}
abc+abd+acd+bcd &= z_1\\
ab+ac+ad+bc+bd+cd &= z_2
\end{align*}
Write the system as $f(x,y) = z_0 = (z_1,z_2)$ where $x = (a,b)$ and $y=(c,d)$. For $z_0 = (2,0) = f(1,-1,-1,-1)$
Can one write the solution set $f^{-1}(\{z_0\})$ near the point $(x_0,y_0) = (1,-1,-1,-1)$ as the graph of a function of variables $a,b$?\\
What about the case $(x_0,y_0)=(1,1,1,1)$?\\
\noindent {\bf Exercise 5.} (Tangent)\\
Given $f$ as in the implicit function theorem, show that the condition $\det F'(y_0) \neq 0$ is equivalent to the condition that the dimension of the projection of the tangent plane $\ker f'(x_0,y_0)$ onto the first $n$ coordinates (the $x$-coordinates) has dimension $n$.\\
\noindent {\bf Exercise 6.} (Coordinate transformation)\\
Consider the coordinate transformation given by $f(x,y) = (\frac{x}{x^2+y^2},\frac{-y}{x^2+y^2})$. Does there exist an inverse to $f$ close to the point $(2,1,f(2,1))$?\\
Identifying $\C$ with $\R^2$ we see that $f$ is really the function $z\mapsto \frac{1}{z}$.\\
\noindent {\bf Exercise 7.} (Invertible?)\\
Explain why $|A-Id|<\frac{1}{2}$ implies that linear map $A\in L(\R^n,\R^n)$ is invertible.\\
{\footnotesize Hint: It is enough to show that $Ax \neq 0$ when $x\neq 0$. This follows from an estimate on $|Ax-x|$.}\\
\noindent {\bf Exercise 8.} (Sinful system)\\
Set $f(x,y,z) = (\sin \sin (x+ y),\sin \sin \sin (x+y+z))$ and $p = (\frac{\pi}{4},\frac{\pi}{2},\frac{\pi}{2})$.
Show that the solution set $f^{-1}(\{\frac{1}{2}\})$ is the graph of some function near point $p$.\\
\noindent {\bf Exercise 9.} (Inverse)\\
Suppose $f$ is a $C^1$ differentiable function and $f'(x)$ is invertible for all $x$ in the domain of $f$. Does this mean that $f$ is a bijection? \\
\noindent {\bf Exercise 10.} (Implicit)\\
Use the implicit function theorem to show that the solutions to $f(x,y,z) = (5,2)$ near $(0,1,2)$ can be written as the graph of a function $g$.
Here $f(x,y,z) = (x^2+y^2+z^2, yz)$.\\ Also find a one variable function parametrizing the solutions of $f(x,y,z) = (2,1)$ near $(0,1,1)$.
\\
\noindent {\bf Exercise 11.} (Flattening)\\
Given $C^1$-function $\R^{n+m}\xrightarrow{f} \R^m$ such that $f(x_0,y_0) = z_0$ and the final $m$ columns of $f'(x_0,y_0)$ form a basis of $\R^m$,
show that there exist open sets $N\ni x_0$ and $M\ni y_0$ and an invertible $C^1$ function $N\times M\xrightarrow{B} N\times M$ with $C^1$ inverse
such that $f^{-1}(\{z_0\}) \cap N\times M) = B(\{0\}\times M)$.
\\
\noindent {\bf Exercise 12.} (Proof tracking)\\
Write out explicitly what the proof of the implicit function theorem says in the case where $f$ is a linear function. What is $G$, what is $H$ what is $h$ what is $g$ and why does it make sense?
\\
\section{Picard's theorem on existence of solutions to ODE}
Recall that a vector field on open set $P\subset \R^n$ is just a function $P\xrightarrow{F} \R^n$. Vector fields provide a way to encode ordinary differential equations (ODE) basically by 'following the arrows'. More precisely solving a differential equation comes down to finding an integral curve as defined below.
\begin{definition}{\bf(Integral curve)}\\
Imagine a vector field $P\xrightarrow{F} \R^n$ on open set $P\subset \R^n$. An integral curve $\gamma$ for $F$ through $p\in P$ is a differentiable map $(-a,a)\xrightarrow{\gamma} P$ for some $a>0$ such that $\gamma'(t) = F(\gamma(t))$ for all $t\in (-a,a)$ and $\gamma(0) = p$.
\end{definition}
\begin{theorem} {\bf(Existence of solutions to ODE)}\\
If $P\xrightarrow{F} \R^m$ is a $C^1$ vector field on $P$ then for any $p\in P$ there exists an integral curve for $F$ through $p$. Any two integral such integral curves have to agree on an interval $(-a,a)$ for some $a>0$.
\end{theorem}
\begin{proof}
We first reformulate the theorem in terms of integration. In what follows the integral of a function $\R^m \xrightarrow{G} \R^n$ over a rectangle $R$ just means the integral of each of its components so $\int_R G = \int_R \sum_i G^i e_i = \sum_i (\int_R G^i) e_i$.
$F$ is continuous so there is a closed ball $\overline{B}_r(p)\subset P$ with radius $r$ and center $p$ and a constant $M$ such that $|F(x)|\leq M$ for all $x\in \overline{B}_r(p)$. Choose $\tau>0$ such that $\tau M\leq r$ and $\tau L <1$ where $L = \max_{x\in \overline{B}_r(p)}|F'(x)|$.
For a curve $(-a,a)\xrightarrow{\gamma} P$ define a new curve $\Phi(\gamma)$ by $\Phi(\gamma)(t) = p+\int_{[0,t]}F\circ \gamma$. Notice that if we had a curve $\gamma$ such that $\Phi(\gamma) = \gamma$, i.e a fixed point for $\Phi$, then that $\gamma$ would be an integral curve for $F$ through $p$. Conversely, any integral curve for $F$ through $p$ would be a fixed point of $\Phi$ since $F\circ
\gamma = \gamma'$ so $\Phi(\gamma)(t) = p+\int_{[0,t]}\gamma' = \gamma(t)$ by the fundamental theorem of calculus.
Considering $\Phi$ as a function from the space of curves $\mathcal{C}=\{[-\tau,\tau]\xrightarrow{\gamma} \overline{B}_r(p)| \gamma\ \text{continuous}\}$ to itself we now seek to apply the Banach lemma \ref{lem.Banach} to find a fixed point. Recall that on $\mathcal{C}$ we measure distance using $|\gamma-\delta| = \sup_{|t| \leq \tau}|\gamma(t)-\delta(t)|$.
Notice that $\Phi(\gamma)$ is in fact differentiable (by the fundamental theorem of calculus). Also that $\Phi(\gamma)\in \mathcal{C}$ for $\gamma\in \mathcal{C}$ because $|p-\Phi(\gamma)(t)| = |\int_{[0,t]}\Phi(\gamma)| \leq \tau M0$ satisfying the requirements of the lemma. Also, $d\omega = 0$ since $\Lambda^{k+1}{\R^k}^* = \{0\}$. Then using Stokes we find $\int_{\gamma\circ\varphi}\omega = \int_\varphi \gamma^*\omega = \int_{\varphi} d\gamma^*\alpha = \int_{\p \varphi}\gamma^*\alpha = \int_{\p I^k}\gamma^*\alpha = \int_{I^k}\gamma^*d\alpha = \int_{I_k}\gamma^*\omega = \int_{\gamma}\omega$.
\end{proof}
\subsection*{Exercises}
{\bf Exercise 1}\\
For each of the $k$-covector fields $\omega$ below either find an $\alpha\in \Omega^{k-1}$ such that $d\alpha = \omega$ or prove that it cannot be done.
\begin{enumerate}
\item[a.] $\omega\in \Omega^2(\R^2-\{0\})$ given by $\omega(p) = e^1\wedge e^2$.
\item[b.] $\omega\in \Omega^3(\R^4-\{0\})$ given by $\omega(p) = e^1(p)e^1\wedge e^2 \wedge e^4+e^1(p)e^2(p)e^2\wedge e^3 \wedge e^4$.
\item[c.] $\omega\in \Omega^1(\R^2-\{0\})$ given by $\omega(p) = \frac{-e^2(p)e^1+e^1(p)e^2}{|p|^2}$.
\item[d.] $\omega\in \Omega^1(((-1,1)^2)$ given by $\omega(p) = \frac{-e^2(p-q)e^1+e^1(p-q)e^2}{|p-q|^2}$ and $q = (2,2)$.
\end{enumerate}
\end{document}