Memorization capacity and robustness of neural networks

  • Patrick Finke

Research output: ThesisDoctoral thesis 1 (Research UU / Graduation UU)

Abstract

Machine learning, and deep learning in particular, has recently undergone rapid advancements. To contribute to a rigorous understanding of deep learning, this thesis explores two different research directions. In the first part, we study the interpolation properties of neural networks. Modern practice has demonstrated that neural networks can exhibit remarkable performance when they possess a very large number of parameters. The beginning of this benign over-parameterized regime of model size is marked by the interpolation threshold, where the model is just large enough to interpolate its training data. While estimates based on the memorization capacity of neural networks—which requires the network to be able to interpolate any arrangement of a certain number of samples with arbitrary labeling—can be used to estimate the interpolation threshold, they turn out to be overly pessimistic. Therefore, we analyze interpolation on a given data set directly and prove upper bounds on the required size of a neural network that interpolates this data. In particular, we analyze fully-connected threshold networks, then extend our analysis to translation-invariant convolutional threshold neural networks, an architecture commonly used in image processing. Our approach is constructive in that we introduce a randomized algorithm that, given the data set as an input, with high probability constructs an interpolating neural network. We link the required number of parameters to a ‘problem-adaptive’ measure of mutual complexity of the data classes, which depends on their geometric properties as well as their mutual arrangement and, in the case of convolutional networks, also takes the translation-invariant structure of the model into account. This results in guarantees that are independent of the number of samples, unlike bounds based on memorization capacity. In the second part, we study the Lipschitz constants of random neural networks. In practice, neural networks can be susceptible to certain kinds of adversarial attacks, for instance, adversarial examples. These are inputs to which a small perturbation, possibly imperceptible to the human eye, is added in order to alter the output of the network. The Lipschitz constant of a neural network—which measures the influence of an input perturbation on the output of the network over the whole domain—can be used as a worst-case measure of how vulnerable a neural network is to adversarial examples. We analyze, for any $p \in [1, \infty]$, the $\ell_p$-Lipschitz constants of deep random neural networks with ReLU activations. The weights are drawn independently at random according to the popular He initialization, which naturally connects these networks to the random initializations used before training. For the biases, we only make the mild assumption that their entries are independently drawn from a symmetric distribution with a bounded probability density function. For networks that have sufficient width, we derive high probability upper and lower bounds on the $\ell_p$-Lipschitz constants that differ by at most a factor logarithmic in the width and linear in the depth of the network. In the special case of shallow networks, we are able to derive matching bounds.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Utrecht University
Supervisors/Advisors
  • Dirksen, Sjoerd, Supervisor
  • Salanevich, Palina, Co-supervisor
Award date19 Jan 2026
Place of PublicationUtrecht
Publisher
Print ISBNs978-90-393-8009-3
DOIs
Publication statusPublished - 19 Jan 2026

Keywords

  • neural networks
  • memorization
  • interpolation
  • random hyperplane tessellations
  • high-dimensional geometry
  • invariant convolutional neural networks
  • Lipschitz constant
  • random neural networks
  • adversarial robustness

Fingerprint

Dive into the research topics of 'Memorization capacity and robustness of neural networks'. Together they form a unique fingerprint.

Cite this