Seminar abstracts: WiSe 2024
Injectivity and Stability of ReLU layers.
Daniel Haider, 27.01.2025
We present a frame theoretic perspective to characterize the injectivity of ReLU layers in terms of all three involved ingredients: (i) the weights, (ii) the bias, and (iii) the domain where the input data comes from. Reconstruction can either be done explicitly using dual frames or iteratively by modifying the classic frame algorithm. Finally, we show novel lower Lipschitz bounds that are independent of any dimension, analogous to the corresponding results for (real) phase retrieval and optimal up to a constant factor. This is joint work with M. Ehler, D. Freeman, and P. Balazs.
We present a frame theoretic perspective to characterize the injectivity of ReLU layers in terms of all three involved ingredients: (i) the weights, (ii) the bias, and (iii) the domain where the input data comes from. Reconstruction can either be done explicitly using dual frames or iteratively by modifying the classic frame algorithm. Finally, we show novel lower Lipschitz bounds that are independent of any dimension, analogous to the corresponding results for (real) phase retrieval and optimal up to a constant factor. This is joint work with M. Ehler, D. Freeman, and P. Balazs.
Phase-space control with deep learning accelerated tracking.
Matthias Remta, 20.01.2025
Fixed target experiments are an integral part of the Physics programme at CERN. These experiments leverage a secondary particle beam generated from the interaction of the primary beam with a target. The design of these targets is limited by the peak energy density deposited during the primary beam’s passage. This beam typically has a Gaussian-like profile and the peak density occurs at the beam’s center. An ongoing project aims to create custom phase-space and transverse profiles through resonant phase-space manipulation, in order to reduce the peak energy density. Numerical simulations of the particles dynamics are the well established approach to obtain phase-space distributions. Such simulations are very time consuming, limiting the exploration of high-dimensional parameter spaces. This project intends to exploit and improve deep learning methods, such as Physics Informed Neural Networks, as an alternative. This talk introduces the project, discusses preliminary results and outlines future activities.
Fixed target experiments are an integral part of the Physics programme at CERN. These experiments leverage a secondary particle beam generated from the interaction of the primary beam with a target. The design of these targets is limited by the peak energy density deposited during the primary beam’s passage. This beam typically has a Gaussian-like profile and the peak density occurs at the beam’s center. An ongoing project aims to create custom phase-space and transverse profiles through resonant phase-space manipulation, in order to reduce the peak energy density. Numerical simulations of the particles dynamics are the well established approach to obtain phase-space distributions. Such simulations are very time consuming, limiting the exploration of high-dimensional parameter spaces. This project intends to exploit and improve deep learning methods, such as Physics Informed Neural Networks, as an alternative. This talk introduces the project, discusses preliminary results and outlines future activities.
Mitigating noise in memristor-based analog neural network accelerators for space applications.
Zacharia Rudge, 13.01.2025
In recent years, the space community has been exploring the possibilities of Artificial Intelligence (AI), specifically Artificial Neural Networks (ANNs), for a variety of applications onboard spacecraft. However, this development is thwarted by the limited energy available onboard spacecraft (especially small satellites such as smallsats and cubesats) as well as the susceptibility of modern chips to radiation. This necessitates research into neural network accelerators capable of meeting these requirements whilst satisfying the high performance needs of space applications.
This talk will cover an emerging hardware technology that has the potential of satisfying these needs, but has been completely neglected in the space sector so far: "memristors". Memristors (as implemented using Phase-Change Memory (PCM) and Resistive Random-Access Memory (RRAM)) may enable low-power, low-latency computation of edge neural network applications, including space applications. To evaluate the feasibility of memristor based neural networks for space, several on-board space applications are evaluated, namely guidance and control networks and geodesy networks (similar to neural radiance fields).
We show in simulation that memristive accelerators are able to learn these tasks, though challenges remain, with a major limiting factor being the impact of hardware noise (read-and write noise) on the achieved accuracy. Several currently studied methods for mitigating such noise on an architectural level will be discussed. The study outlined in this talk provides a foundation for future research into memristor-based AI accelerators for space, highlighting their potential but also the need for further investigation and developments.
In recent years, the space community has been exploring the possibilities of Artificial Intelligence (AI), specifically Artificial Neural Networks (ANNs), for a variety of applications onboard spacecraft. However, this development is thwarted by the limited energy available onboard spacecraft (especially small satellites such as smallsats and cubesats) as well as the susceptibility of modern chips to radiation. This necessitates research into neural network accelerators capable of meeting these requirements whilst satisfying the high performance needs of space applications.
This talk will cover an emerging hardware technology that has the potential of satisfying these needs, but has been completely neglected in the space sector so far: "memristors". Memristors (as implemented using Phase-Change Memory (PCM) and Resistive Random-Access Memory (RRAM)) may enable low-power, low-latency computation of edge neural network applications, including space applications. To evaluate the feasibility of memristor based neural networks for space, several on-board space applications are evaluated, namely guidance and control networks and geodesy networks (similar to neural radiance fields).
We show in simulation that memristive accelerators are able to learn these tasks, though challenges remain, with a major limiting factor being the impact of hardware noise (read-and write noise) on the achieved accuracy. Several currently studied methods for mitigating such noise on an architectural level will be discussed. The study outlined in this talk provides a foundation for future research into memristor-based AI accelerators for space, highlighting their potential but also the need for further investigation and developments.
Towards optimal training of neural networks
Michael Feischl, 16.12.2024
We look at two examples where ideas from optimal mesh refinement can help with the training of neural networks. We aim to prove convergence rates that relate the training loss with the size of the network and discuss an application in neural ODEs.
We look at two examples where ideas from optimal mesh refinement can help with the training of neural networks. We aim to prove convergence rates that relate the training loss with the size of the network and discuss an application in neural ODEs.
The Theory to Practice Gap in Neural Operator Learning
Philipp Grohs, 09.12.2024
The theory-to-practice gap refers to the following phenomenon: there exist (many) functions that are well approximable by small neural networks but learning them from a finite amount of sampled data to within a given accuracy is intractable. The theory-to-practice gap was first empirically observed in [1] and rigorously proven to exist in [2]. This talk presents recent joint work with S. Lanthaler and M. Trautner where we establish the existence of a theory-to-practice gap in neural operator learning.
The theory-to-practice gap refers to the following phenomenon: there exist (many) functions that are well approximable by small neural networks but learning them from a finite amount of sampled data to within a given accuracy is intractable. The theory-to-practice gap was first empirically observed in [1] and rigorously proven to exist in [2]. This talk presents recent joint work with S. Lanthaler and M. Trautner where we establish the existence of a theory-to-practice gap in neural operator learning.
Differentiable Regularization of the condition number for neural networks
Rossen Nenov, 02.12.2024
Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing term into the optimization algorithm. However, due to its discontinuous nature and lack of differentiability the condition number is not suitable for a gradient descent approach. This talk presents a novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers.
Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing term into the optimization algorithm. However, due to its discontinuous nature and lack of differentiability the condition number is not suitable for a gradient descent approach. This talk presents a novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers.
Frame Multipliers and Compressive Sensing
Georg Tauböck, 25.11.2024
We investigate the applicability of frame multipliers as compressive sensing measurements. We show that, under certain conditions, subsampled frame multipliers yield measurement matrices with desirable properties. To that end, we prove a general probabilistic nullspace property for arbitrary nonempty sets, that accounts for the special measurement structure induced by subsampled frame multipliers. Conditions for uniqueness of reconstruction of signals that are sparse with respect to dictionaries or, more generally, to non-linear locally Lipschitz mappings are obtained as special cases. Furthermore, we show that a frame multiplier matrix is full superregular, i.e., that all its minors are nonzero, for almost all frame symbol vectors, provided that the underlying frames are full spark and sufficiently redundant. Since Gabor frames are full spark for almost all windows, we study Gabor multipliers in more detail and are able to derive improved constants for some scenarios. Finally, our simulation results reveal that, in many instances, subsampled frame multiplier matrices exhibit the same ℓ1-reconstruction performance as i.i.d. Gaussian measurement matrices.
We investigate the applicability of frame multipliers as compressive sensing measurements. We show that, under certain conditions, subsampled frame multipliers yield measurement matrices with desirable properties. To that end, we prove a general probabilistic nullspace property for arbitrary nonempty sets, that accounts for the special measurement structure induced by subsampled frame multipliers. Conditions for uniqueness of reconstruction of signals that are sparse with respect to dictionaries or, more generally, to non-linear locally Lipschitz mappings are obtained as special cases. Furthermore, we show that a frame multiplier matrix is full superregular, i.e., that all its minors are nonzero, for almost all frame symbol vectors, provided that the underlying frames are full spark and sufficiently redundant. Since Gabor frames are full spark for almost all windows, we study Gabor multipliers in more detail and are able to derive improved constants for some scenarios. Finally, our simulation results reveal that, in many instances, subsampled frame multiplier matrices exhibit the same ℓ1-reconstruction performance as i.i.d. Gaussian measurement matrices.
AlphaFold.
Michael Scherbela, 18.11.2024
John Jumper and Demis Hassabis have been awarded the 2024 Nobel Prize in Chemistry for their development of AlphaFold 2, a neural network which predicts the 3D structure of proteins. This presentation gives a high-level overview of their seminal paper "Highly accurate protein structure prediction with AlphaFold" (https://doi.org/10.1038/s41586-021-03819-2). I plan to cover the specific problem the model is trying to solve, intuitions from biology which guide the model design, a rough overview of the model and training procedure, as well as some particularly interesting model components.
John Jumper and Demis Hassabis have been awarded the 2024 Nobel Prize in Chemistry for their development of AlphaFold 2, a neural network which predicts the 3D structure of proteins. This presentation gives a high-level overview of their seminal paper "Highly accurate protein structure prediction with AlphaFold" (https://doi.org/10.1038/s41586-021-03819-2). I plan to cover the specific problem the model is trying to solve, intuitions from biology which guide the model design, a rough overview of the model and training procedure, as well as some particularly interesting model components.
Dimension-independent learning rates for high-dimensional classification problems.
Andrés Filipe Lerma Pineda, 11.11.2024
In this talk, we address the problem of approximating and estimating classification functions whose decision boundaries lie within the RBV^2 space. We begin by providing an overview of this functional class and discuss its potential to mitigate the curse of dimensionality in classification tasks. Our main result shows that any function in the RBV^2 space can be approximated by a neural network to any desired accuracy. Furthermore, we prove that such a neural network can be constructed with bounded weights. We then examine the learning problem of estimating such a function from a fixed sample size. Finally, we illustrate the practical advantages of enforcing an RBV^2-boundary on a classification function, demonstrating through numerical examples how this assumption can improve the efficiency of learning from data.
In this talk, we address the problem of approximating and estimating classification functions whose decision boundaries lie within the RBV^2 space. We begin by providing an overview of this functional class and discuss its potential to mitigate the curse of dimensionality in classification tasks. Our main result shows that any function in the RBV^2 space can be approximated by a neural network to any desired accuracy. Furthermore, we prove that such a neural network can be constructed with bounded weights. We then examine the learning problem of estimating such a function from a fixed sample size. Finally, we illustrate the practical advantages of enforcing an RBV^2-boundary on a classification function, demonstrating through numerical examples how this assumption can improve the efficiency of learning from data.
Classification problem with Barron regular boundaries and margin condition.
Jonathan García Rebellón, 04.11.2024
We prove that a classifier with a Barron-regular decision boundary can be approximated with a fast rate by ReLU neural networks with three hidden layers when a margin condition is assumed. More specifically, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate comparable to a low-dimensional smooth function. We performed binary classification simulations with various margins for four different dimensions, with the highest dimensional problem corresponding to images from the MNIST database. (Joint work with Philipp Petersen)
We prove that a classifier with a Barron-regular decision boundary can be approximated with a fast rate by ReLU neural networks with three hidden layers when a margin condition is assumed. More specifically, for strong margin conditions, high-dimensional discontinuous classifiers can be approximated with a rate comparable to a low-dimensional smooth function. We performed binary classification simulations with various margins for four different dimensions, with the highest dimensional problem corresponding to images from the MNIST database. (Joint work with Philipp Petersen)
Function from form: a brief journey covering neurons, graphs, lattices, and stars.
Dominik Dold, 21.10.2024
I will give a brief overview of my past and future research at the border of deep learning, bio-inspiration, and space exploration. Generally, I investigate how functionality emerges from simple, locally interacting components (and as energy efficient as possible!) - be it biological and artificial neural networks, lattice structures, (relational) graph structures, multi-robot systems, or satellite swarms. In this talk, I will address three questions that drove forward my research: (1) how could Bayesian inference be enabled mechanistically in the brain, (2) how can spiking neural networks process graph-structured relational data, and (3) how can ideas from AI and biology be adopted to construct reprogrammable mechanical systems, e.g., for space technologies.
I will give a brief overview of my past and future research at the border of deep learning, bio-inspiration, and space exploration. Generally, I investigate how functionality emerges from simple, locally interacting components (and as energy efficient as possible!) - be it biological and artificial neural networks, lattice structures, (relational) graph structures, multi-robot systems, or satellite swarms. In this talk, I will address three questions that drove forward my research: (1) how could Bayesian inference be enabled mechanistically in the brain, (2) how can spiking neural networks process graph-structured relational data, and (3) how can ideas from AI and biology be adopted to construct reprogrammable mechanical systems, e.g., for space technologies.