Seminar abstracts: WiSe 2023
Transferability of Graph Neural Networks using Graphon and Sampling Theories
Martina Neuman, Nov 08 2023
Graph neural networks (GNNs) have become powerful tools for processing graph-based information in various domains. A desirable property of GNNs is transferability, where a trained network can swap in information from a different graph without retraining and retain its accuracy. A recent method of capturing transferability of GNNs is through the use of graphons, which are symmetric, measurable functions representing the limit of large dense graphs. In this talk, I will present an explicit two-layer graphon neural network (WNN) architecture for approximating bandlimited signals, and explain how a related GNN guarantees transferability between both deterministic weighted graphs and simple random graphs. The proposed WNN and GNN architectures overcome issues related to the curse of dimensionality and offer practical solutions for handling graph data of varying sizes.
Graph neural networks (GNNs) have become powerful tools for processing graph-based information in various domains. A desirable property of GNNs is transferability, where a trained network can swap in information from a different graph without retraining and retain its accuracy. A recent method of capturing transferability of GNNs is through the use of graphons, which are symmetric, measurable functions representing the limit of large dense graphs. In this talk, I will present an explicit two-layer graphon neural network (WNN) architecture for approximating bandlimited signals, and explain how a related GNN guarantees transferability between both deterministic weighted graphs and simple random graphs. The proposed WNN and GNN architectures overcome issues related to the curse of dimensionality and offer practical solutions for handling graph data of varying sizes.
Optimal learning of piecewise smooth functions
Philipp Petersen, Dec 06 2023
Deep learning has established itself as, by far, the most successful machine learning approach in sufficiently complex tasks. Nowadays, it is used in a wide range of highly complex applications such as natural language processing or even scientific applications. Its first major breakthrough, however, was achieved by shattering the state-of-the-art in image classification. We revisit the problem of classification or more general learning of functions with jumps by deep neural networks and attempt to find an answer to why deep networks are remarkably effective in this regime.
We will interpret the learning of classifiers as finding piecewise constant functions from labelled samples. Piecewise constant/smooth functions also appear in many applications associated with physical processes such as in transport problems where shock fronts develop. We then precisely link the hardness of the learning problem to the complexity of the regions where the function is smooth. Concretely, we will establish fundamental lower bounds on the learnability of certain regions. Finally, we will show that in many cases, these optimal bounds can be achieved by deep-neural-network-based learning.
In quite realistic settings, we will observe that deep neural networks can learn high-dimensional classifiers without a strong dependence of the learning rates on the dimension.
Deep learning has established itself as, by far, the most successful machine learning approach in sufficiently complex tasks. Nowadays, it is used in a wide range of highly complex applications such as natural language processing or even scientific applications. Its first major breakthrough, however, was achieved by shattering the state-of-the-art in image classification. We revisit the problem of classification or more general learning of functions with jumps by deep neural networks and attempt to find an answer to why deep networks are remarkably effective in this regime.
We will interpret the learning of classifiers as finding piecewise constant functions from labelled samples. Piecewise constant/smooth functions also appear in many applications associated with physical processes such as in transport problems where shock fronts develop. We then precisely link the hardness of the learning problem to the complexity of the regions where the function is smooth. Concretely, we will establish fundamental lower bounds on the learnability of certain regions. Finally, we will show that in many cases, these optimal bounds can be achieved by deep-neural-network-based learning.
In quite realistic settings, we will observe that deep neural networks can learn high-dimensional classifiers without a strong dependence of the learning rates on the dimension.
Self-concordant regularization in Machine Learning
Adeyemi D. Adeoye, Dec 13 2023
Regularization techniques have proved to be key in reducing overfitting issues in machine learning, ultimately improving the machine learning model's ability to generalize well. Such techniques, leading to a so-called "structural risk minimization" problem, can help to customize efficient methods for solving the resulting problem, particularly when one seeks (approximate) second-order methods that account for curvature information in the solution steps for faster convergence. Precisely, when the regularization function is self-concordant, Newton-type methods can select learning rates or step-sizes that exploit the self-concordant property of the regularization function for faster convergence and globalization of the method when the problem is convex.
The generalized Gauss-Newton method exploiting this property will be presented in this talk, showing a promising approximation scheme that can lead to cheap iteration steps, especially when the problem is overparameterized or in the mini-batch setting. I will also present a smoothing framework for constructing such algorithmic self-concordant regularization functions from penalty functions that are integral parts of the optimization problem, such as those added to promote specific structures in the solution estimates. Numerical examples that demonstrate the efficiency of this approach and its superiority over existing approaches will be shown. While our framework currently covers the convex setting, a toy example of a neural network training problem shows that our approach is promising for non-convex optimization. Finally, I will present a Julia package associated with our framework.
Regularization techniques have proved to be key in reducing overfitting issues in machine learning, ultimately improving the machine learning model's ability to generalize well. Such techniques, leading to a so-called "structural risk minimization" problem, can help to customize efficient methods for solving the resulting problem, particularly when one seeks (approximate) second-order methods that account for curvature information in the solution steps for faster convergence. Precisely, when the regularization function is self-concordant, Newton-type methods can select learning rates or step-sizes that exploit the self-concordant property of the regularization function for faster convergence and globalization of the method when the problem is convex.
The generalized Gauss-Newton method exploiting this property will be presented in this talk, showing a promising approximation scheme that can lead to cheap iteration steps, especially when the problem is overparameterized or in the mini-batch setting. I will also present a smoothing framework for constructing such algorithmic self-concordant regularization functions from penalty functions that are integral parts of the optimization problem, such as those added to promote specific structures in the solution estimates. Numerical examples that demonstrate the efficiency of this approach and its superiority over existing approaches will be shown. While our framework currently covers the convex setting, a toy example of a neural network training problem shows that our approach is promising for non-convex optimization. Finally, I will present a Julia package associated with our framework.
Deep neural networks can stably solve high-dimensional, noisy, non-linear inverse problems
Andrés Felipe Lerma Pineda, Jan 10 2024
We present a neural-network-based method for the solution of inverse problems when only noisy measurements are available. This method solves problems modeled via an infinite-dimensional continuous operator with a discontinuous inverse. Our proposed method restricts this forward operator to finite-dimensional spaces in such a way that the inverse is Lipschitz continuous. First, we restrict the operator's domain to a finite dimensional vector space and then we evaluate the output only at a finite set of sampling points. We prove that this restricted operator has a Lipschitz continuous inverse when a sufficient number of samples is chosen. For the class of Lipschitz continuous functions, we construct a neural network which dampens the noise when tested with perturbed data. Besides, this neural network can be trained with noisy data and performs well when tested with additional noisy data. We provide numerical examples from real-life applications to illustrate the feasibility of our method.
We present a neural-network-based method for the solution of inverse problems when only noisy measurements are available. This method solves problems modeled via an infinite-dimensional continuous operator with a discontinuous inverse. Our proposed method restricts this forward operator to finite-dimensional spaces in such a way that the inverse is Lipschitz continuous. First, we restrict the operator's domain to a finite dimensional vector space and then we evaluate the output only at a finite set of sampling points. We prove that this restricted operator has a Lipschitz continuous inverse when a sufficient number of samples is chosen. For the class of Lipschitz continuous functions, we construct a neural network which dampens the noise when tested with perturbed data. Besides, this neural network can be trained with noisy data and performs well when tested with additional noisy data. We provide numerical examples from real-life applications to illustrate the feasibility of our method.
Machine Learning Model to Predict Fuel Cell Degradation Effects
Bernhard Einberger, Jan 17 2024
Fuel Cell technology could take a key role in the industries transformation towards decarbonization. Therefore, the component must be well designed and long-lasting. Several simulation tools are developed by AVL in order to increase the engineering capability of that technology. This thesis introduces a workflow that describes how to generate Fuel Cell degradation data with a commercial 3D-CFD solver (FireM©) and process the data by means of Machine Learning tools. A detailed simulation model is utilized to calibrate a simplified simulation model. The Electrocatalytic Surface Area suggested by the simplified simulation model differs with measurements by roughly 30%. Two ML models were set up to predict degradation effects, a Kernel Ridge Regression, and a Dense Neural Network model. The following degradation effects are predicted: current density reduction, equivalent weight (membrane), platinum particle number (CAT-CL), specific platinum surface (CAT-CL), membrane thickness, CAT-CL thickness and ANO-CL thickness. The ML model’s prediction error is less than 4% for each degradation effect. A way to de-feature time series data (e.g., load profile) with utilizing Principal Component Analysis is introduced and proven to be applicable at this task. The purpose of the ML model is to better understand the effect of operating conditions on Fuel Cell degradation and therefore increase the engineering capability.
Fuel Cell technology could take a key role in the industries transformation towards decarbonization. Therefore, the component must be well designed and long-lasting. Several simulation tools are developed by AVL in order to increase the engineering capability of that technology. This thesis introduces a workflow that describes how to generate Fuel Cell degradation data with a commercial 3D-CFD solver (FireM©) and process the data by means of Machine Learning tools. A detailed simulation model is utilized to calibrate a simplified simulation model. The Electrocatalytic Surface Area suggested by the simplified simulation model differs with measurements by roughly 30%. Two ML models were set up to predict degradation effects, a Kernel Ridge Regression, and a Dense Neural Network model. The following degradation effects are predicted: current density reduction, equivalent weight (membrane), platinum particle number (CAT-CL), specific platinum surface (CAT-CL), membrane thickness, CAT-CL thickness and ANO-CL thickness. The ML model’s prediction error is less than 4% for each degradation effect. A way to de-feature time series data (e.g., load profile) with utilizing Principal Component Analysis is introduced and proven to be applicable at this task. The purpose of the ML model is to better understand the effect of operating conditions on Fuel Cell degradation and therefore increase the engineering capability.
Two-layer networks with the ReLU^k activation function: Barron spaces and derivative approximation
Yuanyuan Li, Jan 24 2024
We investigate the use of two-layer networks with the rectified power unit, which is called the ReLU^k activation function, for function and derivative approximation. By extending and calibrating the corresponding Barron space, we show that two-layer networks with the ReLU^k activation function are well-designed to approximate an unknown function and its derivatives simultaneously. When the measurement is noisy, we propose a Tikhonov-type regularization method and provide error bounds when the regularization parameter is chosen appropriately. Several numerical examples support the efficiency of the proposed approach.
We investigate the use of two-layer networks with the rectified power unit, which is called the ReLU^k activation function, for function and derivative approximation. By extending and calibrating the corresponding Barron space, we show that two-layer networks with the ReLU^k activation function are well-designed to approximate an unknown function and its derivatives simultaneously. When the measurement is noisy, we propose a Tikhonov-type regularization method and provide error bounds when the regularization parameter is chosen appropriately. Several numerical examples support the efficiency of the proposed approach.
An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning
Anastasis Kratsios, Jan 31 2024
We build universal approximators of continuous maps between arbitrary Polish metric spaces X and Y using universal approximators between Euclidean spaces as building blocks. Earlier results assume that the output space Y is a topological vector space. We overcome this limitation by "randomization": our approximators output discrete probability measures over Y. When X and Y are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of X and Y. For barycentric Y, including Banach spaces, R-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to Y-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning. As an application, we show that the solution operator to an RDE can be approximated within our framework.
We build universal approximators of continuous maps between arbitrary Polish metric spaces X and Y using universal approximators between Euclidean spaces as building blocks. Earlier results assume that the output space Y is a topological vector space. We overcome this limitation by "randomization": our approximators output discrete probability measures over Y. When X and Y are Polish without additional structure, we prove very general qualitative guarantees; when they have suitable combinatorial structure, we prove quantitative guarantees for Hölder-like maps, including maps between finite graphs, solution operators to rough differential equations between certain Carnot groups, and continuous non-linear operators between Banach spaces arising in inverse problems. In particular, we show that the required number of Dirac measures is determined by the combinatorial structure of X and Y. For barycentric Y, including Banach spaces, R-trees, Hadamard manifolds, or Wasserstein spaces on Polish metric spaces, our approximators reduce to Y-valued functions. When the Euclidean approximators are neural networks, our constructions generalize transformer networks, providing a new probabilistic viewpoint of geometric deep learning. As an application, we show that the solution operator to an RDE can be approximated within our framework.