Seminar abstracts: SoSe 2024
From disordered systems to the mathematics of data science
Antoine Maillard, Mar 13 2024
I will introduce a connection between statistical learning or high-dimensional inference, and models of statistical mechanics in which a large number of elementary units interact with each other in a disordered manner. This relationship is the basis of a vast mathematical program leveraging tools and insights from the physics of disordered systems to improve our understanding of high-dimensional probabilistic models, with applications e.g. in machine learning and constraint satisfaction problems. I will then detail concrete examples of this program by presenting recent advances concerning the phase retrieval problem, as well as a challenging open problem of high-dimensional probability known as the ellipsoid fitting conjecture.
I will introduce a connection between statistical learning or high-dimensional inference, and models of statistical mechanics in which a large number of elementary units interact with each other in a disordered manner. This relationship is the basis of a vast mathematical program leveraging tools and insights from the physics of disordered systems to improve our understanding of high-dimensional probabilistic models, with applications e.g. in machine learning and constraint satisfaction problems. I will then detail concrete examples of this program by presenting recent advances concerning the phase retrieval problem, as well as a challenging open problem of high-dimensional probability known as the ellipsoid fitting conjecture.
Declipping and saturation recovery
Lukas Liehr, Mar 20 2024
A fundamental problem in signal analysis is the recovery of an object from clipped or saturated measurements. The latter refers to the effect that measurement devices can only measure quantities within a certain range. A classical example is the recording of audio, where sound can only be captured within a specific decibel range. The problem of recovering an object from clipped or saturated measurements is known as declipping or saturation recovery. The authors of the recent paper [W. Alharbi, D. Freeman, D. Ghoreishi, B. Johnson, N. L. Randrianarivony, Declipping and the recovery of vectors from saturated measurements, arXiv:2402.03237 (2024)] establish a rigorous mathematical framework for this problem in terms of frame theory. The purpose of this talk is to provide a self-contained introduction to saturation recovery and to discuss the main results of the aforementioned article.
A fundamental problem in signal analysis is the recovery of an object from clipped or saturated measurements. The latter refers to the effect that measurement devices can only measure quantities within a certain range. A classical example is the recording of audio, where sound can only be captured within a specific decibel range. The problem of recovering an object from clipped or saturated measurements is known as declipping or saturation recovery. The authors of the recent paper [W. Alharbi, D. Freeman, D. Ghoreishi, B. Johnson, N. L. Randrianarivony, Declipping and the recovery of vectors from saturated measurements, arXiv:2402.03237 (2024)] establish a rigorous mathematical framework for this problem in terms of frame theory. The purpose of this talk is to provide a self-contained introduction to saturation recovery and to discuss the main results of the aforementioned article.
Gabor phase retrieval via semidefinite programming
Martin Rathmair, April 10 2024
Given X a function space consisting of complex-valued functions, the corresponding phase retrieval problem is to reconstruct F ∈ X from phase-less observations, that is from (samples of) its modulus |F|. Among other difficulties, the lack of convexity makes such a problem a challenging task. We discuss the case where X is the image of L2 under the short-time Fourier transform (STFT) with Gaussian window, i.e., reconstruction of the STFT from the spectrogram. We propose a reconstruction scheme which is based on solving two convex problems (both of which can be reformulated as semidefinite programs) and argue that the method accurately reconstructs F (up to a constant phase factor) provided that |F| is connected in a certain graph-theoretic sense (note that the problem at hand requires some sort of connectivity to be well-posed). This is joint work with P. Jaming (U. Bordeaux).
Given X a function space consisting of complex-valued functions, the corresponding phase retrieval problem is to reconstruct F ∈ X from phase-less observations, that is from (samples of) its modulus |F|. Among other difficulties, the lack of convexity makes such a problem a challenging task. We discuss the case where X is the image of L2 under the short-time Fourier transform (STFT) with Gaussian window, i.e., reconstruction of the STFT from the spectrogram. We propose a reconstruction scheme which is based on solving two convex problems (both of which can be reformulated as semidefinite programs) and argue that the method accurately reconstructs F (up to a constant phase factor) provided that |F| is connected in a certain graph-theoretic sense (note that the problem at hand requires some sort of connectivity to be well-posed). This is joint work with P. Jaming (U. Bordeaux).
Monotone generative modeling via a geometry-preserving mapping
Wonjun Lee, April 17 2024
Generative Adversarial Networks (GANs) are powerful tools for creating new content, but they face challenges such as sensitivity to starting conditions and mode collapse. To address these issues, we propose a deep generative model that utilizes the Gromov-Monge embedding (GME). It helps identify the low-dimensional structure of the underlying measure of the data and then map it, while preserving its geometry, into a measure in a low-dimensional latent space, which is then optimally transported to the reference measure. We guarantee the preservation of the underlying geometry by the GME and c-cyclical monotonicity of the generative map, where c is an intrinsic embedding cost employed by the GME. The latter property is a first step in guaranteeing better robustness to initialization of parameters and mode collapse. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images, avoiding mode collapse, and exhibiting robustness to different starting conditions.
Generative Adversarial Networks (GANs) are powerful tools for creating new content, but they face challenges such as sensitivity to starting conditions and mode collapse. To address these issues, we propose a deep generative model that utilizes the Gromov-Monge embedding (GME). It helps identify the low-dimensional structure of the underlying measure of the data and then map it, while preserving its geometry, into a measure in a low-dimensional latent space, which is then optimally transported to the reference measure. We guarantee the preservation of the underlying geometry by the GME and c-cyclical monotonicity of the generative map, where c is an intrinsic embedding cost employed by the GME. The latter property is a first step in guaranteeing better robustness to initialization of parameters and mode collapse. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images, avoiding mode collapse, and exhibiting robustness to different starting conditions.
Operator Learning via Approximation of Symbols
Thomas Dittrich, April 24 2024
Operator learning is a recent development in the simulation of partial differential equations (PDEs) by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an (approximate) mapping in infinite-dimensional spaces that is capable of simulating the PDE. In our work we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric and our main result identifies sufficient conditions for achieving a predefined approximation error. Secondly, we then focus on a natural extension of our main theorem, in which we manage to reduce the assumptions on the sequence of semi-norms. Based on some existing approximation result for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well and we also show the analogy of this approximation to the design of digital filters in Signal Processing.
Operator learning is a recent development in the simulation of partial differential equations (PDEs) by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an (approximate) mapping in infinite-dimensional spaces that is capable of simulating the PDE. In our work we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric and our main result identifies sufficient conditions for achieving a predefined approximation error. Secondly, we then focus on a natural extension of our main theorem, in which we manage to reduce the assumptions on the sequence of semi-norms. Based on some existing approximation result for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well and we also show the analogy of this approximation to the design of digital filters in Signal Processing.
Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders
Martina Neuman, May 15 2024
We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including a striking sorting property, a rate-optimal approximation of smooth functions or approximation without the curse of dimensionality. Moreover, positive-weight spiking neural networks are shown to depend continuously on their parameters which facilitates classical covering number-based generalization statements. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.
We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including a striking sorting property, a rate-optimal approximation of smooth functions or approximation without the curse of dimensionality. Moreover, positive-weight spiking neural networks are shown to depend continuously on their parameters which facilitates classical covering number-based generalization statements. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.
Expressivity of Spiking Neural Networks
Manjot Singh, May 22 2024
The synergy between spiking neural networks and neuromorphic hardware holds promise for the development of energy-efficient AI applications. Inspired by this potential, we revisit the foundational aspects to study the capabilities of spiking neural networks where information is encoded in the firing time of neurons. Under the Spike Response Model as a mathematical model of a spiking neuron with a linear response function, we compare the expressive power of artificial and spiking neural networks, where we initially show that they realize piecewise linear mappings. In contrast to ReLU networks, we prove that spiking neural networks can realize both continuous and discontinuous functions. Moreover, we provide complexity bounds on the size of spiking neural networks to emulate multi-layer (ReLU) neural networks. Restricting to the continuous setting, we also establish complexity bounds in the reverse direction for one-layer spiking neural networks.
The synergy between spiking neural networks and neuromorphic hardware holds promise for the development of energy-efficient AI applications. Inspired by this potential, we revisit the foundational aspects to study the capabilities of spiking neural networks where information is encoded in the firing time of neurons. Under the Spike Response Model as a mathematical model of a spiking neuron with a linear response function, we compare the expressive power of artificial and spiking neural networks, where we initially show that they realize piecewise linear mappings. In contrast to ReLU networks, we prove that spiking neural networks can realize both continuous and discontinuous functions. Moreover, we provide complexity bounds on the size of spiking neural networks to emulate multi-layer (ReLU) neural networks. Restricting to the continuous setting, we also establish complexity bounds in the reverse direction for one-layer spiking neural networks.
Interpretable geometric description of dynamical systems
Adám Gosztolai, May 29 2024
It is increasingly recognised that computations in the brain and artificial neural networks can be understood as outputs of a high-dimensional dynamical system conformed by the activity of large neural populations. Yet revealing the underpinning latent dynamical processes from data and interpreting their relevance in computational tasks remains challenging. A prominent line of research has observed that neural activity is often confined to low-dimensional smooth manifolds. However, there is a lack of theoretical frameworks for the unsupervised representation of neural dynamics that are interpretable based on behavioural variables, comparable across systems, and decodable to behaviour with high accuracy. In this talk, I will introduce Manifold Representation Basis Learning (MARBLE), an unsupervised representation-learning framework for non-linear dynamical systems. Our approach combines empirical dynamical modelling and geometric deep learning to decompose state space dynamics into a statistical distribution of local flow fields. I will show that this decomposition allows the development of unsupervised learning algorithms and obtaining latent representations that are preserved under different manifold embeddings. I will show that MARBLE representations give rise to a well-defined similarity metric between dynamical systems, e.g., different instances of recurrent neural networks and animals. Moreover, they are expressive enough to compare computations and detect continuous and discontinuous changes due to external variables. In neuroscience, I will show that these properties allow the discovery of interpretable representations of neural activity in motor, navigation and cognitive tasks and lead to significantly higher decoding performance than state-of-the-art. Our results suggest that using the manifold structure yields a new class of algorithms with higher performance and the ability to assimilate data across experiments.
It is increasingly recognised that computations in the brain and artificial neural networks can be understood as outputs of a high-dimensional dynamical system conformed by the activity of large neural populations. Yet revealing the underpinning latent dynamical processes from data and interpreting their relevance in computational tasks remains challenging. A prominent line of research has observed that neural activity is often confined to low-dimensional smooth manifolds. However, there is a lack of theoretical frameworks for the unsupervised representation of neural dynamics that are interpretable based on behavioural variables, comparable across systems, and decodable to behaviour with high accuracy. In this talk, I will introduce Manifold Representation Basis Learning (MARBLE), an unsupervised representation-learning framework for non-linear dynamical systems. Our approach combines empirical dynamical modelling and geometric deep learning to decompose state space dynamics into a statistical distribution of local flow fields. I will show that this decomposition allows the development of unsupervised learning algorithms and obtaining latent representations that are preserved under different manifold embeddings. I will show that MARBLE representations give rise to a well-defined similarity metric between dynamical systems, e.g., different instances of recurrent neural networks and animals. Moreover, they are expressive enough to compare computations and detect continuous and discontinuous changes due to external variables. In neuroscience, I will show that these properties allow the discovery of interpretable representations of neural activity in motor, navigation and cognitive tasks and lead to significantly higher decoding performance than state-of-the-art. Our results suggest that using the manifold structure yields a new class of algorithms with higher performance and the ability to assimilate data across experiments.
Statistical behaviour of the max-sliced Wasserstein distance
Daniel Bartl, June 05 2024
I will begin by defining the Wasserstein distance and discussing the challenges posed by the curse of dimensionality when estimating a high-dimensional distribution from i.i.d. samples thereof. I will then explain how the max-sliced Wasserstein distance overcomes these challenges. Finally, I will elaborate on the proof, which relies on a uniform version of the Dvoretzky-Kiefer-Wolfowitz inequality. (Joint work with S. Mendelson.)
I will begin by defining the Wasserstein distance and discussing the challenges posed by the curse of dimensionality when estimating a high-dimensional distribution from i.i.d. samples thereof. I will then explain how the max-sliced Wasserstein distance overcomes these challenges. Finally, I will elaborate on the proof, which relies on a uniform version of the Dvoretzky-Kiefer-Wolfowitz inequality. (Joint work with S. Mendelson.)
Tighter Generalization Bounds on Digital Computers via Discrete Optimal Transport
Martina Neuman, June 12 2024
Machine learning models with inputs in a d-dimensional Euclidean space, when implemented on digital computers, generalize, and their generalization gap converges to 0 at a rate of c/N^{1/2} concerning the sample size N. However, the constant c>0 obtained through classical methods can be large in terms of the ambient dimension d and the machine precision, posing a challenge when N is small to realistically large. In this paper, we derive a family of generalization bounds c_m/N^{1/(2\vee m)} tailored for learning models on digital computers, which adapt to both the sample size N and the so-called geometric representation dimension of the discrete learning problem. Adjusting the parameter m according to N results in significantly tighter generalization bounds for practical sample sizes N, while setting m small maintains the optimal dimension-free worst-case rate of O(1/N^{1/2}). Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in discrete optimal transport, established via leveraging metric embedding arguments.
Machine learning models with inputs in a d-dimensional Euclidean space, when implemented on digital computers, generalize, and their generalization gap converges to 0 at a rate of c/N^{1/2} concerning the sample size N. However, the constant c>0 obtained through classical methods can be large in terms of the ambient dimension d and the machine precision, posing a challenge when N is small to realistically large. In this paper, we derive a family of generalization bounds c_m/N^{1/(2\vee m)} tailored for learning models on digital computers, which adapt to both the sample size N and the so-called geometric representation dimension of the discrete learning problem. Adjusting the parameter m according to N results in significantly tighter generalization bounds for practical sample sizes N, while setting m small maintains the optimal dimension-free worst-case rate of O(1/N^{1/2}). Furthermore, our adaptive generalization bounds are formulated based on our new non-asymptotic result for concentration of measure in discrete optimal transport, established via leveraging metric embedding arguments.
Learning metriplectic systems and other bracket-based dynamics
Anthony Gruber, June 19 2024
The metriplectic formalism is a useful framework for constructing and explaining phenomenological models of physical phenomena that conserve energy and generate entropy. However, general metriplectic equations of motion are highly complicated, relying on delicate compatibility conditions involving the kernels of algebraic brackets. This talk discusses a recent method for machine-learning provably metriplectic dynamics from data in a way that is (1) universally approximating, (2) admits an error estimate, and (3) scales optimally with respect to the number of learnable parameters. Through finite-dimensional benchmark examples, it is shown that the proposed method is fully expressive and capable of reliably learning metriplectic dynamics, even in cases where only partial state data is observed.
The metriplectic formalism is a useful framework for constructing and explaining phenomenological models of physical phenomena that conserve energy and generate entropy. However, general metriplectic equations of motion are highly complicated, relying on delicate compatibility conditions involving the kernels of algebraic brackets. This talk discusses a recent method for machine-learning provably metriplectic dynamics from data in a way that is (1) universally approximating, (2) admits an error estimate, and (3) scales optimally with respect to the number of learnable parameters. Through finite-dimensional benchmark examples, it is shown that the proposed method is fully expressive and capable of reliably learning metriplectic dynamics, even in cases where only partial state data is observed.