The SOLACE seminar is held on Thursdays at 2pm at the
Laboratory of Analysis and Architecture of Systems (LAAS).

During the academic year 2023–2024, the seminar was organized by
Céline Comte.

Sean Meyn – July 11, 2024 (Part II) – Slides

Preamble: For decades, power systems academics have proclaimed the need for real time prices to create a more efficient energy grid. The rationale is economics 101: proper price signals will lead to an efficient outcome. The mathematics is elegant — who doesn’t love Lagrange multipliers? However, application of this mathematics to electric power markets cannot be justified. This harsh assessment is based on a casual review of economics 101; in particular, the definition of efficiency. The conclusions of the power economist are correct if the population of rational power consuming agents have a concave utility function that models preference for electric power. However, in the real world, a typical user of electric power knows this is absurd. Look at any of the big power consumers at your own home: is your personal welfare a continuous function of the electric power consumed by each device? The absurdity is most clear for on-off loads such as a water heater, refrigerator, A/C or pool pump. The creation of a realistic economic model requires that we talk to humans. We might learn that a particular agent uses a refrigerator to keep food cold, and a water heater to help keep their family and their dishes clean. It follows that the electrical energy products of interest to this agent are complex functions of time, and only loosely related to electric power – the quantity associated with price signals, and in particular the price surges witnessed at ERCOT this year and in 2011. There is a longer history in the communications sector where surge pricing was promoted, and then dismissed based on similar reasoning.

To realize a reliable energy grid and to promote innovation, we must abandon the dogma of surge pricing.

The lecture consists of two parts:

- Realistic models of consumer preferences, and how these can be used to design distributed control strategies to obtain the needed services. This model also makes clear why the popular "prices to devices" paradigm will destabilize the power grid.
- A survey of distributed control techniques to obtain grid services from millions of electric loads.

This talk is based on two surveys:

- Hala Ballouz, Joel Mathias, Sean Meyn, Robert Moye, Joseph Warrington. Reliable Power Grid: Long Overdue Alternatives to Surge Pricing (abridged version to appear)
- Yue Chen, Md Umar Hashmi, Joel Mathias, Ana Bušic, Sean Meyn. Distributed control design for balancing the grid using flexible loads. Chapter in the collection,
Energy Markets and Responsive Grids: Modeling, Control, and Optimization , pages 383–411. Springer, New York, NY, 2018.

Sean Meyn – July 11, 2024 (Part I) – Slides

This question was prompted by the discussion following my keynote lecture on June 21 at the workshop Reinforcement Learning for Stochastic Networks, held at ENSEEIHT.

Participants pointed out that my conclusions for Q-learning are precisely what would be anticipated by theory for the Actor-Critic method (for which there is a far firmer mathematical foundation than Q-learning). The lecture will review this theory and come to surprising conclusions. For one, I claim that *theory for actor-critic as currently implemented is no stronger than theory for Q-learning*! In contrast, the approach of Konda and Tsitsiklis, that is largely dismissed, is an instance of unbiased stochastic gradient descent.

My hope is that the new theory presented in my lecture of June 21 will open the doors to justification of popular versions of the actor-critic method.

This talk is based on Chapter 10 "Setting the Stage, Return of the Actors" from the book Control Systems and Reinforcement Learning.

Sean Meyn – June 27, 2024 – Slides

Many machine learning and optimization algorithms solve hidden root-finding problems through the magic of stochastic approximation (SA). Unfortunately, these algorithms are slow to converge: the optimal convergence rate for the mean squared error (MSE) is of order $O(n^{-1})$ at iteration $n$. Far faster convergence rates are possible by reconsidering the design of exploration signals used in these algorithms. In this lecture the focus is on quasi-stochastic approximation (QSA), in which a multi-dimensional clock process defines exploration. It is found that algorithms can be designed to achieve a MSE convergence rate approaching $O(n^{-4})$. Although the framework is entirely deterministic, this new theory leans heavily on concepts from the theory of Markov processes. Most critical is Poisson’s equation to transform the QSA equations into a mean flow with additive “noise” with attractive properties. Existence of solutions to Poisson’s equation is based on Baker’s Theorem from number theory—to the best of our knowledge, this is the first time this theorem has been applied to any topic in engineering! The theory is illustrated with applications to gradient free optimization in the form of Spall’s SPSA1 algorithm / extremum seeking control. It is found that these algorithms are not necessarily stable since they violate key assumptions in the SA literature. Resolutions are presented, and from this we obtain the astonishingly quick convergence rates. Joint research with Caio Lauand, current graduate student at UF.

Vivek Borkar – June 6 and 13, 2024 – Slides about Stochastic approximation – Slides about MDPs – Slides about RL

I shall give a quick overview of the `ODE' (for `Ordinary Differential Equations') approach to the analysis of stochastic approximation algorithms and then using (mostly) infinite horizon discounted cost Markov decision processes as the test case, give its applications to analyzing the convergence of various reinforcement learning algorithms.

Sergey Foss – May 30, 2024

For a stochastically monotone Markov chain taking values in a Polish space, we present a number of conditions for existence and for uniqueness of its stationary regime, as well as for closeness of its transient trajectories. In particular, we generalise a basic result by Bhattacharya and Majumdar (2007) where a certain form of mixing, or swap condition was assumed uniformly over the state space. We do not rely on continuity properties of transition probabilities.

This is a joint work with Michael Scheutzow (TU Berlin).

Sergey Foss – May 23, 2024

This is a short overview on results obtained by a group of researchers within the last 20 years. A survey paper has been placed in Arxiv.org in December 2023.

We consider directed random graphs, the prototype of which being the Barak-Erdős graph $\vec G(Z, p)$, and study the way that long (or heavy, if weights are present) paths grow. This is done by relating the graphs to certain particle systems that we call Infinite Bin Models (IBM). A number of limit theorems are shown. The goal of this paper is to present results along with techniques that have been used in this area. In the case of $\vec G(Z, p)$ the last passage percolation constant $C(p)$ is studied in great detail. It is shown that $C(p)$ is analytic for $p>0$, has an interesting asymptotic expansion at $p=1$ and that $C(p)/p$ converges to $e$ like $1/(\log p)^2$ as $p \to 0$. The paper includes the study of IBMs as models on their own as well as their connections to stochastic models of branching processes in continuous or discrete time with selection. Several proofs herein are new or simplified versions of published ones. Regenerative techniques are used where possible, exhibiting random sets of vertices over which the graphs regenerate. When edges have random weights we show how the last passage percolation constants behave and when central limit theorems exist. When the underlying vertex set is partially ordered, new phenomena occur, e.g., there are relations with last passage Brownian percolation. We also look at weights that may possibly take negative values and study in detail some special cases that require combinatorial/graph theoretic techniques that exhibit some interesting non-differentiability properties of the last passage percolation constant. We also explain how to approach the problem of estimation of last passage percolation constants by means of perfect simulation.

Dieter Fiems – April 4, 2024 – Slides

Series expansion techniques for Markov chains aim at expanding the steady-state solution of the Markov chain in terms of one of its parameters. Assuming that the chain is easy to solve for a particular value of this parameter, it is often the case that one can also find the higher order terms of the expansion around this value. This is not unsurprising, as the system of equations for the steady state solution are somewhat related to the equations for the terms in the series expansion. The series expansion technique therefore extends the solution for a single parameter value to solutions in a neighbourhood of this particular parameter value. The quintessential example where expansion techniques apply in queueing theory is the light-traffic approximation: this is a series expansion in the system load. The technique is not limited to light-traffic situations though. By means of some examples, we illustrate the possibilities of series expansion techniques, either as a purely numerical solution technique, or in combination with analytical results. We further discuss the importance of convergence acceleration techniques. Such techniques not only allow for accelerating convergence in the region of convergence of the series expansion, but also allow for extending the region of convergence which vastly widens the scope of the expansion technique.

Dieter Fiems – March 28, 2024 – Slides

Most urban areas suffer from traffic congestion, with peaks during the morning and evening commutes. At an individual level, commuters optimise their travel times by opting for the shortest routes in time. Up till recently, these route choices where often confined to a couple of routes per commuter, and the route choice at any day was at most influenced by experience of past commutes at similar times, or by generic reports on the radio on the traffic conditions around major cities. Hence, the chosen routes were frequently suboptimal. In contrast, nowadays, online applications like Waze and Google maps can direct commuters to the routes which offer the least expected travel times, given the present traffic situation, such that commuters can rationally opt for the best route.

By selecting the shortest routes, these online applications hold the promise to better spread traffic over different routes, and thereby reduce congestion. However, if more commuters rely on these applications, there is also a real risk that these applications create congestion by routing too many commuters to a noncongested route as routing decisions only affect congestion levels at a later time. One can argue that the online application should be better at forecasting future congestion levels. However, these congestion problems are not created by a lack of information on the future congestion levels, but by the routing decisions taken by these online applications. As routing decisions are taken by the individual commuters, this is a distributed control problem which can be studied by game-theoretic methods.

In particular, our modelling effort draws upon traffic flow theory to relate traffic density and traffic flow at a macroscopic scale, on queueing theory to relate traffic intensity and density at this scale, and on the concept of the time-dependent Wardrop equilibrium to model the routing choice of individual commuters. As an application, we study how park-and-ride systems can mitigate congestion levels, and how time-dependent pricing can be used to obtain the socially optimal traffic mix.

Manon Costa – March 21, 2024 – Slides

In this talk we consider the problem of optimal portfolio allocation with {CV@R penalty} when dealing with imperfectly simulated financial assets.

We use a Stochastic biased Mirror Descent to find optimal resource allocation for a portfolio whose underlying assets cannot be generated exactly and may only be approximated with a numerical scheme that satisfies suitable error bounds, under a risk management constraint.

We establish almost sure asymptotic properties as well as the rate of convergence for the averaged algorithm. We then focus on the optimal tuning of the overall procedure to obtain an optimized numerical cost. Our results are then illustrated numerically on simulated as well as real data sets.

Pascal Maillard – March 7, 2024 – Paper – Slides

The N-particles branching Brownian motion (N-BBM) is a branching Markov process which describes the evolution of a population of particles undergoing reproduction and selection. It shares many properties with the $N$-particles branching random walk (N-BRW), which itself is strongly related to physical $p$-spin models, or to Derrida's Continuous Random Energy Model (CREM). The N-BRW can also be seen as the realization of an optimization algorithm over hierarchical data, often called beam search. More precisely, the maximal displacement of the N-BRW (or N-BBM) can be seen as the output of the beam search algorithm, with N the « width » of the beam, that almost matches the computational complexity of the algorithm.

In this paper, we investigate the maximal displacement at time $T$ of an N-BBM, where $N=N(T)$ is picked arbitrarily depending on T and the diffusion of the process is inhomogeneous in time. We prove the existence of a phase transition in the second order of the maximal displacement when log N(T) is of order $T^{1/3}$. This can be interpreted as an "efficiency ceiling" in the output of the beam search algorithm, which extends previous results by Addario-Berry and Maillard regarding an algorithm hardness threshold for optimizing the Hamiltonian of the CREM.

Based on joint work with Alexandre Legrand.

Francisco Robledo Relaño – February 29, 2024 – Slides

In this presentation, we introduce the Lagrange Policy for Continuous Actions (LPCA), a new reinforcement learning algorithm developed for restless multi-armed bandit problems with continuous action spaces. LPCA addresses the unique challenge of making decisions in environments where resources are limited and actions are continuous.

We will discuss the structure of LPCA, which combines neural network-based Q-value computation with a Lagrange multiplier approach. This combination allows for effective policy learning by decoupling the arms in multi-armed bandit problems under resource constraints.

The presentation will cover two variations of LPCA: LPCA-DE, which uses differential evolution for global optimization, and LPCA-Greedy, which selects actions incrementally based on the gradients of Q-values. We will present a comparative analysis of LPCA against the DDPG algorithm enhanced with OptLayer, showcasing LPCA's effectiveness in resource allocation and reward maximization.

Ernesto Garcia Ciganda – February 1, 2024 – Slides

In this talk, I will present ongoing joint work with Vittorio Puricelli and Matthieu Jonckheere. We aim to characterize the limit of the 'generalized graph Laplacian' operator on graphs, proposed in [1]. Our focus is on random geometric graphs, where nodes are sampled independently from some unknown probability distributions. We aim to understand this limit, under appropriate scaling as the number of nodes increases, for clustering purposes.

[1] Generalized Spectral Clustering for Directed and Undirected Graphs. Harry Sevi, Matthieu Jonckheere, Argyris Kalogeratos. Link to the paper

Ludovic Righetti – December 14, 2023 – Slides

To walk, run, jump or manipulate objects, robots need to constantly interact with objects and the environment. Unfortunately, reasoning about physical interactions is a computationally daunting task. For this reason, robots try to avoid physical interactions at all costs and unexpected physical contacts often lead to failures. In this talk, I will present our recent approach(es) to break down this complexity: the formulation of optimal control problems that leverage machine learning (reinforcement learning) to achieve real-time efficiency and real-robot robustness. I will also demonstrate our algorithms on real manipulation and locomotion examples. Finally, I will discuss current challenges towards real robotic applications.

Urtzi Ayesta – December 7, 2023, and January 11, 2024

We survey the basics of restless multi-armed bandit as introduced by Whittle (1988), and the Lagrangian relaxation approach that leads to the definition of Whittle's Index. We will derive an expression of Whittle's index for the the averaged cost criterion.

Talk based on:

- P. Whittle, Restless bandits: Activity allocation in a changing world,
Journal of applied probability , 1988, 287–298. Link to the paper - M. Larrañaga, U. Ayesta, I.M. Verloop, Asymptotically optimal index policies for an abandonment queue with convex holding cost,
Queueing Systems , 2015, 81(2):99–169. Link to the paper

Sebastian Zaninovich – November 23, 2023

Fermat distances are metrics defined to work with datasets supported on manifolds equipped with densities. These distances function as geodesics in the (complete) graph, penalizing long jumps between points. In this talk, we are going to present some (mathematical) motivation related to first passage percolation, some interesting questions that arise there, the original model, and introduce the noisy case. Our main result states that if the noise converges asymptotically to zero, then the noisy microscopic Fermat distance converges to the non-noisy macroscopic Fermat distance. The presentation is going to be focused on the mathematical aspects of the model, not on the applications.

Bruno Gaujal – November 9, 2023 – Slides – Paper

In this talk, I will present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and optimality refers to the average holding/rejection cost in infinite horizon.

While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs ($S$) in the network. In particular, in contrast with existing regret analyses, our regret bound does not depend on the diameter of the underlying Markov DecisionProcess (MDP), which in most queueing systems is at least exponential in $S$. The novelty of our approach is to leverage Norton’s equivalent theorem for closed product-form queueing networks and an efficient reinforcement learning algorithm for MDPs with the structure of birth-and-death processes.

This is a joined work with Jonatha Anselmi and Louis-Sébastien Rebuffi.

Olivier Brun – October 12, 2023 – Slides

We explore the problem of minimizing the weighted average coflow completion time of coflows when flow sizes are unknown but unreliable predictions on them are available. We propose to use Sincronia, a 4-approximation algorithm when actual flow sizes are known, directly on the predictions. For this variant of Sincronia, we obtain a finite upper bound on the approximation ratio that combines unreliability on predictions and the weights of the coflows. On several numerical experiments, it is shown that this bound is too conservative, and that in practice Sincronia with predictions has a much better performance on average as well as in the worst-case.

Albert Senen-Cerda – September 28, 2023 – Slides – Paper 1 – Paper 2 – Paper 3

Dropout (Hinton et al., 2012) is a well-known method to prevent overfitting in neural networks that temporarily ‘drops out’ nodes of the network at random during each step of the training with stochastic gradient descent.

In the first part of the talk, I will present general convergence guarantees for training neural networks with dropout and investigate its convergence rate using stochastic approximation techniques on simplified models for deep and shallow neural networks. Specifically, we examine theoretically and with simulations the dependence of the convergence rate on the width and depth of the neural network, and more importantly, on the dropout probability. For the considered models, we obtain explicit bounds for the convergence rate that depend on the dropout probability.

In the second part of the talk, we will then go on to analyze random neural networks that have dropout as their source of randomness and show that, despite this randomness, they still satisfy a universal approximation theorem.