Reproducing Kernel Hilbert Space
Table of Contents
1. Definitions
Reproduction Kernel Hilbert Space (RKHS) is a Hilbert Space of function which has some special properties, mainly the existence of a Reproducing Kernel. We can view RKHS in multiple equivalent ways:
1.1. Using Evaluation Functionals
Reproducing Kernel Hilbert Space (RKHS) is a Hilbert Space of function whose "point evaluation" functional is continuous and linear.
That is if we take the point evaluation functional \(L_x\) of the hilbert space \(H\) and if \(L_x\) turns out to be linear and continuous, then \(H\) is a RKHS. Here, \(L_x: H \rightarrow \mathbb{F}\) such that \(L_x(f) = f(x)\).
It is called so, because the linear and continous nature implies an existence of a Reproducing Kernel.
Reproducing Kernel:
Due to Reisz Representation Theorem for all linear functional in \(H\), there exists a kernel \(K_x \in H\) called the "Reproducing Kernel" such that for all functions \(f \in H\):
\[ \langle f, K_x \rangle = f(x) \]
The kernel \(K_x\) is the Reisz Representation of the functional \(L_x\).
1.2. Using Positive Definite Kernel
We can also define the "Reproducing Kernel" as function \(K(x, y)\)
\[ K(x, y) = \langle K_x, K_y \rangle \]
which has the properties:
- It is symmetric
- It is positive definite
Conversely, Moore-Aronszajn theorem says that every symmetric and positive kernel defines a RKHS.
Moore-Aronszajn Theorem:
Suppose \(K\) is a symmetric, positive definite kernel on a set \(X\). Then there is a unique Hilbert space of functions on \(X\) for which \(K\) is a reproducing kernel.
1.3. Using Feature Map
A feature map \(\phi: H \rightarrow F\) is a map from our Hilbert space to another Hilbert space \(F\) which we call feature space.
Then every feature map defines a Kernel via.
\[ K(x, y) = \langle \phi(x), \phi(y) \rangle_{F} \]
where, the inner product is in the feature space.
Conversely, each positive definite Kernel in \(H\) has infinitely many associated feature maps. One of the trivial map is \(F = H\), where \(\phi(x) = K_x\). This property is related to the Kernel Trick in Machine Learning.
1.4. Using Integral Operators
By Mercer's theorem, the kernel \(K(x,y)\) can also be represented as
\[ K(x,y) = \sum_{i=1}^\infty \sigma_i \phi_i(x) \phi_i(y) \]
Where, \(\sigma_i\) are eigenvalues, and \(\phi_i\) are eigenvectors of some Integral Operator (whatever that is ¯\(ツ)_/¯).