Mathematics of Machine Learning and Data Science Seminar

PAST MML RESEARCH GROUP SEMINAR   (Nov -- Dec 2023)

The research group of Philipp Petersen organizes a research seminar, which takes places, regularly weekly on Wednesdays.

Date

Presenter

Title

Nov 8 2023 (online -- 15:00)

A. Martina Neuman

(UWien)

Transferability of GNNs using sampling and graphon theories

Dec 6 2023 (in person -- 15:00)

SR9, Kolingasse 14

Philipp Petersen

(UWien)

Optimal learning of piecewise smooth functions

Dec 13 2023 (in person -- 15:00)

SR9, Kolingasse 14

Adeyemi D. Adeoye

(Scuola IMT and UWien)

Self-concordant regularization in machine learning

 

JOINT RESEARCH SEMINAR: MATH OF MACHINE LEARNING AND DATA SCIENCE

**From Jan 2024 onward, the research groups of Philipp Petersen and Philipp Grohs organize a joint research group seminar, the "Math of Machine Learning and Data Science" seminar.

The joint seminar takes places regularly weekly on Wednesdays. Talks will not be held during Austrian national or public holidays as well as semester breaks. In the events that a talk is held online or hybrid, the Zoom details will be given on this page.

Organizers: Martina Neuman and Sebastian Schmutzhard-Höfler

The talks are approximately one hour long. Everybody is welcome! If you want to be kept up to date about the weekly seminar talks, you can subscribe to the seminar's mailing list here. The schedule is as follows.

SCHEDULE

Date

Presenter

Title

Jan 10 2024 (in person -- 11:00)

7.21, Kolingasse 14

Andrés Felipe Lerma Pineda

(UWien)

Deep neural networks can stably solve high-dimensional, noisy, non-linear inverse problems

Jan 17 2024 (in person -- 11:00)

7.21, Kolingasse 14

Bernhard Einberger

(AVL)

Machine learning model to predict fuel cell degradation effects

Jan 24 2024 (in person -- 11:00)

7.21, Kolingasse 14

Yuanyuan Li

(Fundan U)

Two-layer networks with the ReLU^k activation function: Barron spaces and derivative approximation

**Special Session**

Jan 31 2024 (online -- 16:00)

Anastasis Kratsios

(McMaster U and Vector Institute)

An approximation theory for metric-space-valued functions with a view towards transformers

March 06 2024 (in person -- 10:00)

SR 12, Kolingasse 14

[Organization meeting][Organization meeting]

**Special Session**

March 13 2024 (online -- 10:00)

Antoine Maillard

(ETH Zurich)

From disordered systems to the mathematics of data science

March 20 2024 (in person -- 10:00)

SR 12, Kolingasse 14

Lukas Liehr

(UWien)

Declipping and saturation recovery

April 10 2024 (in person -- 10:00)

SR 12, Kolingasse 14

Martin Rathmair

(UWien)

Gabor phase retrieval via semidefinite programming

**Special Session**

April 17 2024 (online -- 16:00)

Wonjun Lee

(UoMinnesota)

Monotone generative modeling via a geometry-preserving mapping

April 24 2024 (in person -- 10:00)

SR 12, Kolingasse 14

Thomas Dittrich

(RICAM)

Operator Learning via Approximation of Symbols
May 15 2024 (TBA)

Martina Neuman

(UWien)

Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders

May 22 2024 (TBA)

Manjot Singh

(LMU)

(TBA)
May 29 2024 (TBA)Adám Gosztolai (TBA)

ABSTRACTS

Transferability of Graph Neural Networks using Graphon and Sampling Theories

Martina Neuman
Nov 08 2023

Graph neural networks (GNNs) have become powerful tools for processing graph-based information in various domains. A desirable property of GNNs is transferability, where a trained network can swap in information from a different graph without retraining and retain its accuracy. A recent method of capturing transferability of GNNs is through the use of graphons, which are symmetric, measurable functions representing the limit of large dense graphs. In this talk, I will present an explicit two-layer graphon neural network (WNN) architecture for approximating bandlimited signals, and explain how a related GNN guarantees transferability between both deterministic weighted graphs and simple random graphs. The proposed WNN and GNN architectures overcome issues related to the curse of dimensionality and offer practical solutions for handling graph data of varying sizes.

Optimal learning of piecewise smooth functions

Philipp Petersen
Dec 06 2023

Deep learning has established itself as, by far, the most successful machine learning approach in sufficiently complex tasks. Nowadays, it is used in a wide range of highly complex applications such as natural language processing or even scientific applications. Its first major breakthrough, however, was achieved by shattering the state-of-the-art in image classification. We revisit the problem of classification or more general learning of functions with jumps by deep neural networks and attempt to find an answer to why deep networks are remarkably effective in this regime. We will interpret the learning of classifiers as finding piecewise constant functions from labelled samples. Piecewise constant/smooth functions also appear in many applications associated with physical processes such as in transport problems where shock fronts develop. 

We then precisely link the hardness of the learning problem to the complexity of the regions where the function is smooth. Concretely, we will establish fundamental lower bounds on the learnability of certain regions. Finally, we will show that in many cases, these optimal bounds can be achieved by deep-neural-network-based learning. 

In quite realistic settings, we will observe that deep neural networks can learn high-dimensional classifiers without a strong dependence of the learning rates on the dimension.

Self-concordant regularization in Machine Learning

Adeyemi D. Adeoye
Dec 13 2023

Regularization techniques have proved to be key in reducing overfitting issues in machine learning, ultimately improving the machine learning model's ability to generalize well. Such techniques, leading to a so-called "structural risk minimization" problem, can help to customize efficient methods for solving the resulting problem, particularly when one seeks (approximate) second-order methods that account for curvature information in the solution steps for faster convergence. Precisely, when the regularization function is self-concordant, Newton-type methods can select learning rates or step-sizes that exploit the self-concordant property of the regularization function for faster convergence and globalization of the method when the problem is convex.

The generalized Gauss-Newton method exploiting this property will be presented in this talk, showing a promising approximation scheme that can lead to cheap iteration steps, especially when the problem is overparameterized or in the mini-batch setting. I will also present a smoothing framework for constructing such algorithmic self-concordant regularization functions from penalty functions that are integral parts of the optimization problem, such as those added to promote specific structures in the solution estimates. Numerical examples that demonstrate the efficiency of this approach and its superiority over existing approaches will be shown. While our framework currently covers the convex setting, a toy example of a neural network training problem shows that our approach is promising for non-convex optimization. Finally, I will present a Julia package associated with our framework.

Deep neural networks can stably solve high-dimensional, noisy, non-linear inverse problems

Andrés Felipe Lerma Pineda
Jan 10 2024

We present a neural-network-based method for the solution of inverse problems when only noisy measurements are available. This method solves problems modeled via an infinite-dimensional continuous operator with a discontinuous inverse. Our proposed method restricts this forward operator to finite-dimensional spaces in such a way that the inverse is Lipschitz continuous. First, we restrict the operator's domain to a finite dimensional vector space and then we evaluate the output only at a finite set of sampling points. We prove that this restricted operator has a Lipschitz continuous inverse when a sufficient number of samples is chosen. For the class of Lipschitz continuous functions, we construct a neural network which dampens the noise when tested with perturbed data. Besides, this neural network can be trained with noisy data and performs well when tested with additional noisy data. We provide numerical examples from real-life applications to illustrate the feasibility of our method.

Machine Learning Model to Predict Fuel Cell Degradation Effects

Bernhard Einberger
Jan 17 2024

Fuel Cell technology could take a key role in the industries transformation towards decarbonization. Therefore, the component must be well designed and long-lasting. Several simulation tools are developed by AVL in order to increase the engineering capability of that technology. This thesis introduces a workflow that describes how to generate Fuel Cell degradation data with a commercial 3D-CFD solver (FireM©) and process the data by means of Machine Learning tools. A detailed simulation model is utilized to calibrate a simplified simulation model. The Electrocatalytic Surface Area suggested by the simplified simulation model differs with measurements by roughly 30%. Two ML models were set up to predict degradation effects, a Kernel Ridge Regression, and a Dense Neural Network model. The following degradation effects are predicted: current density reduction, equivalent weight (membrane), platinum particle number (CAT-CL), specific platinum surface (CAT-CL), membrane thickness, CAT-CL thickness and ANO-CL thickness. The ML model’s prediction error is less than 4% for each degradation effect. A way to de-feature time series data (e.g., load profile) with utilizing Principal Component Analysis is introduced and proven to be applicable at this task. The purpose of the ML model is to better understand the effect of operating conditions on Fuel Cell degradation and therefore increase the engineering capability.

Two-layer networks with the ReLU^k activation function: Barron spaces and derivative approximation

Yuanyuan Li
Jan 24 2024

We investigate the use of two-layer networks with the rectified power unit, which is called the ReLU^k activation function, for function and derivative approximation. By extending and calibrating the corresponding Barron space, we show that two-layer networks with the ReLU^k activation function are well-designed to approximate an unknown function and its derivatives simultaneously. When the measurement is noisy, we propose a Tikhonov-type regularization method and provide error bounds when the regularization parameter is chosen appropriately. Several numerical examples support the efficiency of the proposed approach.

An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning

Anastasis Kratsios
Jan 31 2024

We build universal approximators of continuous maps between arbitrary Polish metric spaces X and Y using universal approximators between Euclidean spaces as building blocks. Earlier results assume that the output space Y is a topological vector space. We overcome this limitation by "randomization": our approximators output discrete probability measures over Y. When X and Y are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of X and Y. For barycentric Y, including Banach spaces, R-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to Y-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning.

As an application, we show that the solution operator to an RDE can be approximated within our framework.

From disordered systems to the mathematics of data science

Antoine Maillard
Mar 13 2024

I will introduce a connection between statistical learning or high-dimensional inference, and models of statistical mechanics in which a large number of elementary units interact with each other in a disordered manner. This relationship is the basis of a vast mathematical program leveraging tools and insights from the physics of disordered systems to improve our understanding of high-dimensional probabilistic models, with applications e.g. in machine learning and constraint satisfaction problems. I will then detail concrete examples of this program by presenting recent advances concerning the phase retrieval problem, as well as a challenging open problem of high-dimensional probability known as the ellipsoid fitting conjecture.

Declipping and saturation recovery

Lukas Liehr
Mar 20 2024

A fundamental problem in signal analysis is the recovery of an object from clipped or saturated measurements. The latter refers to the effect that measurement devices can only measure quantities within a certain range. A classical example is the recording of audio, where sound can only be captured within a specific decibel range. The problem of recovering an object from clipped or saturated measurements is known as declipping or saturation recovery. The authors of the recent paper [W. Alharbi, D. Freeman, D. Ghoreishi, B. Johnson, N. L. Randrianarivony, Declipping and the recovery of vectors from saturated measurements, arXiv:2402.03237 (2024)] establish a rigorous mathematical framework for this problem in terms of frame theory. The purpose of this talk is to provide a self-contained introduction to saturation recovery and to discuss the main results of the aforementioned article.

Gabor phase retrieval via semidefinite programming

Martin Rathmair
April 10 2024

Given X a function space consisting of complex-valued functions, the corresponding phase retrieval problem is to reconstruct F ∈ X from phase-less observations, that is from (samples of) its modulus |F|. Among other difficulties, the lack of convexity makes such a problem a challenging task. We discuss the case where X is the image of L2 under the short-time Fourier transform (STFT) with Gaussian window, i.e., reconstruction of the STFT from the spectrogram. We propose a reconstruction scheme which is based on solving two convex problems (both of which can be reformulated as semidefinite programs) and argue that the method accurately reconstructs F (up to a constant phase factor) provided that |F| is connected in a certain graph-theoretic sense (note that the problem at hand requires some sort of connectivity to be well-posed). This is joint work with P. Jaming (U. Bordeaux).

Monotone generative modeling via a geometry-preserving mapping

Wonjun Lee
April 17 2024

Generative Adversarial Networks (GANs) are powerful tools for creating new content, but they face challenges such as sensitivity to starting conditions and mode collapse. To address these issues, we propose a deep generative model that utilizes the Gromov-Monge embedding (GME). It helps identify the low-dimensional structure of the underlying measure of the data and then map it, while preserving its geometry, into a measure in a low-dimensional latent space, which is then optimally transported to the reference measure. We guarantee the preservation of the underlying geometry by the GME and c-cyclical monotonicity of the generative map, where c is an intrinsic embedding cost employed by the GME. The latter property is a first step in guaranteeing better robustness to initialization of parameters and mode collapse. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images, avoiding mode collapse, and exhibiting robustness to different starting conditions.

Operator Learning via Approximation of Symbols

Thomas Dittrich
April 24 2024

Operator learning is a recent development in the simulation of partial differential equations (PDEs) by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an (approximate) mapping in infinite-dimensional spaces that is capable of simulating the PDE. In our work we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric and our main result identifies sufficient conditions for achieving a predefined approximation error. Secondly, we then focus on a natural extension of our main theorem, in which we manage to reduce the assumptions on the sequence of semi-norms. Based on some existing approximation result for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well and we also show the analogy of this approximation to the design of digital filters in Signal Processing.

Efficient Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders

Martina Neuman
May 08 2024

We study the learning problem associated with spiking neural networks. Specifically, we consider hypothesis sets of spiking neural networks with affine temporal encoders and decoders and simple spiking neurons having only positive synaptic weights. We demonstrate that the positivity of the weights continues to enable a wide range of expressivity results, including a striking sorting property, a rate-optimal approximation of smooth functions or approximation without the curse of dimensionality. Moreover, positive-weight spiking neural networks are shown to depend continuously on their parameters which facilitates classical covering number-based generalization statements. Finally, we observe that from a generalization perspective, contrary to feedforward neural networks or previous results for general spiking neural networks, the depth has little to no adverse effect on the generalization capabilities.