If you would like to be added to the mailing list for seminar announcements, or if you would like to give a PARKAS seminar, please email Timothy Bourke.
An iCalendar is available.
Organised together with the SemAntique
In Fortran/Algol tradition, a mutable variable is a variable that can be directly mutated as such by assignment, increment, etc. In contrast, ML tradition uses variables of reference types, bound to ‘boxed values’, or reference cells. These are ordinary, lambda-calculus variables that can be substituted for but not mutated. What is mutated is their value: the ‘box’. The two models seem equivalent and easily relatable. When we actually try to relate them — in the context of implicitly heterogeneous metaprogramming: writing OCaml as if it were C — we discover many subtleties, which make the problem nearly intractable. One complication is that mutable variables are not first-class, and a compositional mapping of reference-type variables to mutable variables is simply impossible.
Upon systematic investigation we discover a way to formally relate the two traditions. Practically, we can write (or generate) first-order OCaml code using reference-type variables without restrictions, and straightforwardly map it to the efficient and idiomatic C and be certain of its meaning, in C. We may hence write in (a subset of) OCaml as if it were C, using to the full extent the modules, higher-order functions and other abstraction facilities of OCaml — and obtaining all the performance of C code, benefiting from vectorization, etc. of modern C compilers.
In the Fortran/Algol tradition, the semantics of assignment such as x = x + 1 is non-compositional, and to understand it, L-values were introduced decades ago. L-values simplify assignment by taking all the complexity upon themselves. Our relation of two traditions showed L-values are not needed. What is complicated about assignment is not its semantics but an unusual syntax (sugar).
subtitle: Stateful Stream Fusion as Normalization-by-Evaluation
We present both a stream compilation theory and its implementation as a portable stream processing library strymonas with a familiar declarative interface and very high performance, notably exceeding all competition and attaining the performance of hand-written imperative state machines. The library targets: Java Streams-like applications; digital signal processing including software-defined radio; many sorts of sensor processing such as nested aggregations, compression/decompression.
The library lets us assemble complex stream pipelines just by plugging in simple combinators such as map, filter, flat-map, zip, take, map-accumulate and sliding windowing. Strymonas streams may be infinite, stateful, with the generally variable and statically unknown processing rate.
Strymonas is a code generator with pluggable backends (currently generating C, OCaml and Scala). The pipeline combinators may be freely composed, and yet the resulting convoluted imperative code contains no traces of combinator abstractions: no closures, intermediate objects or tuples. The high-performance is portable and statically guaranteed.
Strymonas has been developed in tandem with the equational theory of stateful streams. Our theoretical model represents all desired pipelines and guarantees the existence of unique normal forms, which are mappable to (fused) state machines. Stream pipeline compilation and optimization are represented as normalization-by-evaluation, and are hence deterministic and terminating, with the guaranteed outcome. The equational theory lets us state and prove the correctness of the complete fusion optimization. Strymonas also implements a novel form of typed closure conversion.
Machine Learning is emphatically dataflow (and therefore reactive) in form when new layers and networks are described in research papers. Surprisingly, practical ML algorithm description and implementation in frameworks such as Jax and compilers such as MLIR is resolutely control flow. This is largely due to the fact that ML frameworks - key enablers of the ML boom - are products of the HPC and PL communities.
While ML algorithms were simple sequential models and the key activity was supervised training on stored Big Data, there were little differences between dataflow and control flow specification styles. But as the complexity of the applications increases and the focus shifts towards reactive ML applications such as embedded ML or Reinforcement Learning, a natively reactive/dataflow specification style could provide important advantages at both specification and implementation level.
We will explore this topic through examples in our new MLR language, introducing both advantages and new challenges.
Deficit Round-Robin (DRR) is a scheduling algorithm that is often used in real-time systems or communication networks for scheduling tasks, or packets. This scheduling algorithm guarantees a share of the bandwidth to each class of traffic, proportional to its assigned quantum. This is an example of bandwidth-sharing policy, as well as other Round-Robin policies like Weighted Round Robin (WRR). Analyzing a bandwidth-sharing policy network requires two steps. The first step is a single-node analysis to compute the service that is guaranteed to each class of traffic. Recent work show that knowing the characteristics of the cross-traffic can improve this guarantee. The second step is a per-class analysis. When all per-node guarantees have been computed, one FIFO network per class can be constructed, and delay bounds can be computed. In this talk, we will show how these two steps are interdependent, especially when the DRR network has cyclic dependencies, and present an iterative scheme that improves both the performance bounds and stability region compared to the state of the art.
Many computing systems are constrained by a fixed amount of available shared memory. Modeling applications with task graphs makes it possible to analyze and optimize their memory usage. The NP-complete problem studied here is finding a parallel schedule of a given task graph that minimizes its memory peak.
Our first contribution is a method that finds memory-optimal sequential schedules for a dataflow task graph. Our method relies on graph transformation rules and has a polynomial complexity. On a large class of graphs, it is able to compress the graph into a single node that contains a sequential schedule optimal w.r.t. the memory peak. Our approach also applies to SDF graphs after converting them into task graphs. However, since that conversion may produce very large graphs, we also propose a new suboptimal method, similar to Partial Expansion Graphs, to reduce the problem size. We evaluate our approach on classic benchmarks, on which we always outperform the state-of-the-art.
From this optimal sequential schedule, our second contribution is a dynamic parallel scheduling algorithm, which consists of a ready list scheduling that we adapt to take into account a given memory peak constraint. Even when the memory peak is the minimal one, we always produce parallel schedules that meets the constraint, and that enjoys very good speedups w.r.t. the sequential schedule. It can also be applied to less harsh memory constraints, leading to more substantial speedups. We compare with alternative approaches on multiple applications expressed as task graphs (scientific workflows, signal processing filters). When memory constraints are close to their minimum, our approach always succeeds in finding a parallel schedule meeting the given constraints, whereas the other approaches mostly fail. When memory constraints are significantly higher, our results are comparable to others for speedup. Furthermore, our approach is faster and can deal with very large task graphs (up to 50,000 nodes) using a naive Python implementation.
Based on joint work with Pascal Fradet and Alexandre Honorat.
References:
P. Fradet, A. Girault, A. Honorat. Sequential Scheduling of Dataflow Graphs for Memory Peak Minimization. LCTES'23. Orlando, FL, USA. June 2023.
P. Fradet, A. Girault, A. Honorat. Graph Transformations for Memory Peak Minimization by Scheduling, ACM Trans. on Embedded Computing Systems, 2024.
P. Fradet, A. Girault, A. Honorat. Parallel scheduling of task graphs with minimal memory requirements. IPDPS'25. Milan, Italy. June 2025.
This PhD defense will be presented in French. The abstract in English can be found below.
Soutenance de thèse
Lustre est un langage synchrone de type flot de données. Sa structure temporelle discrète et son système d'horloges lui permettent de garantir un temps d'exécution et une consommation de mémoire bornés statiquement, favorisant ainsi son adoption dans le domaine de systèmes critiques, notamment au sein de l'outil industriel Scade. Vélus, développé à l'Inria, est un compilateur Lustre vérifié. En s'appuyant sur la définition d'une sémantique formelle pour chaque langage intermédiaire, il apporte une preuve qu'un programme Lustre et sa traduction en Clight (langage pris en charge par CompCert) exhibent des comportements identiques.
Dans cette thèse, nous développons un nouveau modèle du noyau flot de données du langage d'entrée, basé sur une sémantique dénotationnelle synchrone, et donnons les conditions exactes de son équivalence avec le modèle relationnel existant dans Vélus. Cette approche constructive permet d'obtenir une sémantique exécutable, renforçant ainsi le principal théorème de correction de la compilation. Grâce au principe d'induction de Scott, propre à la sémantique dénotationnelle, nous menons des raisonnements très naturels sur la dynamique des programmes, moyennant un traitement explicite des erreurs pouvant survenir à l'exécution. Enfin, nous explorons la possibilité de s'affranchir des contraintes de synchronisation du langage en proposant une correspondance formelle de notre modèle avec celui des réseaux de Kahn. Nous esquissons les principes de l'infrastructure nécessaire à un raisonnement vérifié de bout en bout sur les programmes compilables.
Les membres du jury sont
PhD thesis defense
Lustre is a synchronous dataflow language. Its discrete temporal structure and clocking system guarantee statically bounded execution time and memory consumption, favoring its adoption in mission-critical systems, notably within the Scade industrial tool. Vélus, developed at Inria, is a verified Lustre compiler. Based on the definition of formal semantics for each intermediate language, it provides a proof that a Lustre program and its translation into Clight (the language supported by CompCert) exhibit identical behaviors.
In this thesis, we develop a new model of the input language's dataflow kernel, based on synchronous denotational semantics, and give the exact conditions for its equivalence with the existing relational model in Vélus. This constructive approach results in an executable semantics, reinforcing the main correctness theorem of the compilation. Thanks to Scott's induction principle, specific to denotational semantics, we are able to conduct very natural reasoning on program dynamics, but with explicit treatment of errors that may occur at runtime. Finally, we explore the possibility of freeing ourselves from language synchronization constraints by proposing a formal correspondence between our model and that of Kahn's networks. We outline the principles of the infrastructure required for an end-to-end verified reasoning on compilable programs.
The members of the thesis committee are
Les compilateurs effectuent des transformations sur les programmes, notamment pour optimiser la performance du code généré. Ils doivent cependant s'assurer de la correction des transformations, c'est-à-dire que tous les comportements du programme cible soient des comportements autorisés par le programme source. Certaines transformations ne sont correctes que sous certaines conditions sémantiques (valeurs prises par les variables, aliasing...). Les compilateurs ont donc besoin d'obtenir ces informations, en effectuant des analyses ou en encodant les propriétés utiles dans les représentations (par exemple avec des Undefined Behaviors).
Dans une première partie, je décrirai une représentation intermédiaire visant à permettre l'encodage de propriétés connues sur le programme source. Je m'intéresse en particulier à la compilation de programmes vérifiés ou générés, sur lesquels on connaît des informations précises. J'utilise deux idées pour encoder les informations. D'une part, un système de types dépendants permet l'expression de propriétés et explicite les conditions qui les assurent. D'autre part la représentation utilise de multiples fragments mémoires au lieu d'une unique mémoire globale. On transmet ainsi des informations précises sur l'aliasing. J'ai formalisé au sein de l'assistant de preuve Rocq une sémantique dénotationnelle pour cette représentation.
Dans une seconde partie, je décrirai une passe de compilation vérifiée d'un fragment de la représentation Obc de Vélus vers ma représentation.
Place-and-route (P&R) plays a key bottleneck in integrated circuit design, and existing approaches for P&R fall short in correctness and methodology. We introduce an SMT-based approach which addresses these concerns and introduces some new possibilities for improved co-optimization of designs. We introduce the system design and problem encoding and show some preliminary results to address the key challenge of scalability.
Joint work with: Chandrika Bhardwaj, Goldman Sachs India
Denning's (1976) lattice model provided secure information flow analyses with an intuitive mathematical foundation: the lattice ordering determines permitted flows. We propose a connection-based extension of this framework that permits two autonomous organisations, each employing possibly quite different security lattices, to exchange information while maintaining security of information flow as well as their autonomy in formulating and maintaining security policies. Our prescriptive framework is based on the rigorous mathematical framework of Lagois connections proposed by Melton, together with a simple type system and operational model for transferring object data between the two domains. The merit of this formulation is that it is simple, minimal, adaptable and intuitive. We show that our framework is semantically sound, by proving that the connections proposed preserve standard correctness notions such as noninterference. We then illustrate via examples how Lagois theory also provides a robust framework and methodology for negotiating and maintaining secure agreements on information flow between autonomous organisations, even when either or both organisations change their security lattices. Composition and decomposition properties indicate support for a modular approach to secure flow frameworks in complex organisations. Finally, a natural and conservative extension of the Decentralised Labels Model of naturally induces a Lagois connection between the corresponding security label lattices, thus extending the security guarantees ensured by the decentralised model to encompass bidirectional interorganisational flows.
Bio: Sanjiva Prasad is a Professor of Computer Science and Engineering (CSE) at IIT Delhi, and current Head of the School of Public Policy (SPP). Earlier he headed the CSE Department and the Khosla School of Information Technology (SIT). His research interests include formal methods, specifically concurrency, programming languages and their semantics, security of information flow, formal foundations of networks, and medical applications of computing. He is currently Co-Editor-in-Chief of ACM Books.
The task of system design is shared by all engineering disciplines, each coming with its own techniques. In spite of their differences in tools, there is large intersection in their conceptual approach to design. In this talk, we exploit this commonality to take an abstract view of systems and their composition. We understand systems and subsystems in terms of their assume-guarantee specifications, or contracts.
Assume-guarantee contracts are formal specifications that state (i) the assumptions that a design element makes on its environment and (ii) the guarantees it delivers when the environment behaves according to the contract's assumptions. Contracts come with a rich algebra that allows us to carry out several design-relevant tasks: obtaining system-level specifications from component specifications, finding specifications of components that need to be added to a design in order to meet an objective, etc. We will introduce the algebra of contracts and discuss how the various algebraic operations relate to system-design tasks. We will discuss hypercontracts, an extension of assume-guarantee reasoning to support the formal analysis of key security and robustness properties. We will also discuss the application of this methodology and Pacti, a software package that supports system design using contracts, in applications ranging from space-mission design to synthetic biology.
Bio: Inigo Incer is a postdoctoral researcher at Caltech and UC Berkeley. He obtained his PhD from UC Berkeley in 2022 under the guidance of Alberto Sangiovanni-Vincentelli. He is interested in all aspects of cyber-physical systems, emphasizing formal methods and AI that support their compositional design and analysis. Before pursuing a PhD, Inigo was an IC designer in Austin. His work has been supported by the NSF/ASEE eFellows program and the UC Berkeley Chancellor's Fellowship
Rust has established a reputation for providing both memory and thread safety. When it comes to dependable software, this is a formidable foundation but often not sufficient. In particular, correctly handling unavoidable (but manageable) error conditions and managing resources (such as time) is essential. This talk sheds light on “statically extracting error flows” as well as “automatic and dependable resource management at run- and compile-time”.
We introduce partial sessions and partial (multiparty) session types, in order to deal with open systems, i.e., systems with missing components. Partial sessions can be composed, and the type of the resulting system is derived from those of its components without knowing any suitable global type nor the types of missing parts. We apply these types to a process calculus, for which we prove subject reduction and progress, so that well-typed systems never violate the prescribed constraints. Therefore, partial session types support the development of systems by incremental assembling of components.
This PhD defense will be presented in French. The abstract in English can be found below.
Soutenance de thèse (en français)
Scade est un langage de programmation synchrone utilisé depuis la fin des années 1990 pour concevoir et implémenter des systèmes critiques embarqués tels ceux que l'on trouve dans l'aviation. Cette criticité requiert la formalisation (i) des programmes, afin de garantir qu'ils s'exécutent sans erreurs ou retards pendant une durée arbitraire et (ii) des outils de développement, en particulier le compilateur, pour s'assurer de la préservation des propriétés des modèles lors de la génération de code. Ces activités reposent sur une description flot de données des modèles, inspirée des schémas-blocs, dont la sémantique s'exprime par des fonctions de suites, ainsi que d'une compilation précisément documentée.
Scade descend de Lustre. La version 6 a introduit des évolutions majeures, dans les pas de Lucid synchrone, dont les automates hiérarchique et les tableaux. Pour ces derniers, leur taille est connue et vérifiée statiquement afin d'assurer un temps d'exécution et une mémoire bornés. L'utilisation croissante des tableaux de Scade pour des opérations intensives a révélé plusieurs axes d'amélioration: verbosité des modèles, manque de contrôle de la compilation ou description peu commode de certains algorithmes.
Dans cette thèse, ces trois aspects ont été étudiés à l'aide d'un prototype de compilateur. (i) Nous avons développé un système de types a la Hindley-Milner spécifiant les tailles sous la forme de polynômes multivariés. Cette proposition permet de vérifier et d'inférer la plupart des tailles de manière modulaire. (ii) Nous avons exploré une méthode de compilation alternative, fondée sur un langage déclaratif conscient de la mémoire. Il vise à concilier le style flot de données avec une spécification précise des emplacements mémoire. La description modulaire des tailles en est un élément clé. (iii) Enfin, nous proposons une construction d'itération inspirée de Sisal qui complète les itérateurs actuels. En traitant les tableaux comme des suites finies, elle donne accès aux constructions séquentielles de Scade (automates) lors d'itérations. De plus, elle permet de décrire de manière déclarative des implémentations efficaces d'algorithmes comme la décomposition de Cholesky. Cette compilation contrôlable est un premier pas nécessaire pour la compilation vers des GPUs.
Les membres du jury sont
Memory Specification in a Data-flow Synchronous Language with Statically Sized Arrays
PhD thesis defense (in French)
The synchronous programming language Scade has been used since the end of the 1990s to design safety critical embedded systems such as those found in avionics. This context requires to formally reason about (i) the programs, to ensure that they run for an arbitrary long time without errors or missed deadlines and (ii) the tools, in particular the compiler, to give high confidence in the preservation of model properties through the code generation process. These activities build on a data-flow description of models, inspired by block diagrams, that enjoys a formal semantics in terms of streams and a well-documented compilation process.
Scade stems from Lustre. The Scade 6 version introduced major evolutions following Lucid synchrone, notably hierarchical automata and arrays. For the latter, sizes are statically known and checked, so as to fulfill the bounded resources and execution time constraints. The increasing use of Scade arrays for data-intensive computations has revealed areas for improve- ment: concision of the models, control of the generated code and idiomatic description of array computations.
Using a prototype implementation of a compiler, the present work investigates these three aspects. (i) We designed a Hindley-Milner-like type system for representing sizes with multivariate polynomials. This proposal allows to check and infer most sizes in a modular way. (ii) We explored an alternative compilation process based on a memory-aware declarative language. It aims at reconciling the data-flow style with a precise specification of memory locations. The underlying memory model builds on our modular description of sizes. (iii) Last, we propose an iteration construct inspired by Sisal that supplements the available iterators. By viewing arrays as finite streams, iteration can benefit from the sequential constructs of Scade, e.g., automata. Moreover, it allows declarative descriptions of efficient algorithm implementations such as the Cholesky decomposition. This controllable compilation process is a mandatory step toward code generation for GPUs.
The members of the thesis commitee are
Les FPGA sont des circuits reprogrammables qui constituent une cible de choix, à la fois pour l'accélération matérielle d'applications logicielles et pour l'interaction avec un environnement extérieur (capteurs, actionneurs).
Dans cet exposé, je présenterai un noyau de langage fonctionnel synchrone pour la conception d'applications réactives embarquées sur FPGA. Chaque application écrite dans ce langage est une fonction « instantanée » modélisant directement un circuit synchrone qui reçoit des entrées et produit des sorties à chaque cycle d'horloge.
Je proposerai ensuite une extension conservative de ce langage avec des traits de programmation de haut niveau (récursion terminale, parallélisme à gros grain, tableaux implémentés en mémoire RAM, voire même des structures de données dynamiques). Cette extension facilite la programmation d'applications réactives incorporant des calculs « longs » progressant, pas à pas, de façon synchrone.
PhD thesis defense.
There will be a “pot-de-thèse” downstairs in the Cour aux Ernests after the defense. Extra information can be found on the dedicated web page.
Safety-critical embedded systems are often specified using block-diagram formalisms. SCADE Suite is a development environment for such systems which has been used industrially in avionics, nuclear plants, automotive and other safety-critical contexts for twenty years. Its graphical formalism translates to a textual representation based on the Lustre synchronous dataflow language, with extensions from later languages like Lucid Synchrone. In Lustre, a program is defined as a set of equations that relate inputs and outputs of the program at each discrete time step. The language of expressions at right of equations includes arithmetic and logic operators, delay operators that access the previous value of an expression, and sampling operators that allow some values to be calculated less often than others.
The Vélus project aims at formalizing a subset of the Scade 6 language in the Coq Proof Assistant. It proposes a specification of the dynamic semantics of the language as a relation between infinite streams of inputs and outputs. It also includes a compiler that uses CompCert, a verified compiler for C, as its back end to produce assembly code, and an end-to-end proof that compilation preserves the semantics of dataflow programs.
In this thesis, we extend Vélus to support control blocks present in Scade 6 and Lucid Synchrone, which includes a construction that controls the activation of equations based on a condition (switch), a construction that accesses the previous value of a named variable (last), a construction that re-initializes delay operators (reset), and finally, hierarchical state machines, which allow for the definition of complex modal behaviors. All of these constructions may be arbitrarily nested in a program. We extend the existing semantics of Vélus with a novel specification for these constructs that encodes their behavior using sampling. We propose a generic induction principle for well-formed programs, which is used to prove properties of the semantic model such as determinism and type system correctness. Finally, we describe the extension of the Vélus compiler to handle these new constructs. We show that the existing compilation scheme that lowers these constructs into the core dataflow language can be implemented, specified and verified in Coq. Compiling the reset and last constructs requires deeper changes in the intermediate languages of Vélus.
The members of the thesis committee are
Attention: This is a joint Parkas/Cambium seminar held at Inria Paris in the 12e
This talk will be about the CakeML project. The CakeML project centres around an impure functional programming language, for which we have developed a collection of proofs and tools inside the HOL4 theorem prover. The development includes a proven-correct compiler that can bootstrap itself.
This talk has two parts.
In the first part, I will explain the research questions the CakeML project has tackled and outline the main research ideas that have helped us address them. The research questions include:
In the second part, I will describe how the CakeML project strives to build meaningful connections with other projects and our experience with this so far. We have:
The CakeML project is a collaborative open source project and we are always keen to explore new directions and collaborations.
The CakeML webpage: https://cakeml.org/
The CakeML project lives in the HOL4 theorem prover: https://hol-theorem-prover.org/
Reactive programming languages are dedicated to the programming of systems
which react continuously with their environment. An environment is commonly
associated with physical sensors in a control system, but it may also model
interactions with another program over time, or even the concurrency of the
underlying hardware platform. Values take the form of unbounded streams,
modeling the (discrete) passing of time or the sequence of concurrent
interactions, and computations result from the implicit application of the
fold_left iterator over pointwise reactions on streams. While fold_left
implements forward recurrences, we introduce symmetric reactive semantics
corresponding to the implicit application of the fold_right iterator,
enabling what we would like to refer to as bidirectional reactive
programming. Constraints on backward recurrences allow to preserve important
properties of reactive languages, including and bounded memory and reaction
time. Machine Learning (ML) systems provide numerous motivations for this:
we demonstrate that reverse-mode automatic differentiation, backpropagation,
batch normalization, bidirectional recurrent neural networks, training and
reinforcement learning algorithms, are all naturally captured as
bidirectional reactive programs. We will discuss our ongoing progress in the
area.
(Joint work with Dumitru Potop-Butucaru and Hugo Pompougnac.)
(En collaboration avec C. Tasson.)
Plusieurs travaux récents et moins récents suggèrent d'intégrer une notion abstraite de permission au cœur d'un langage de programmation. Les permissions doivent a minima pouvoir être comparées et combinées, c'est-à-dire appartenir à un préordre monoïdal.
Cet exposé traitera dans un premier temps d'un λ-calcul simplement typé où la gestion des permissions est présentée sous forme fonctionnelle, c'est-à-dire introduite via une construction d'abstraction et éliminée via une construction d'application. Le langage exhibe les propriétés attendues des λ-calculs typés (réduction du sujet, confluence, normalisation) et ce quel que soit le préordre monoïdal des permissions considéré.
Dans un second temps, j'instancierai ce langage générique sur le préordre monoïdal des horloges synchrones, et y ajouterai la récursion gardée et les suites infinies. Je conjecture que les programmes bien typés du langage obtenu sont productifs et synchrones : toute sortie d'une fonction de suites définissable contient une infinité d'éléments dont chacun ne dépend que des éléments de rang inférieur des entrées. Je discuterai enfin de la possibilité de formuler la productivité comme un résultat de normalisation infinitaire.
Model checking is a technique for exhaustive verification. Although recent algorithms allow one to verify large models, the scalability is still limited for models with real-time constraints, that is, when time delays, execution times, and deadlines are modeled explicitly. In this talk, we present some attempts to develop efficient algorithms for model checking timed automata with large discrete state spaces using techniques based on predicate abstraction, binary decision diagrams, SMT solvers, and automata learning algorithms.
An essential benefit of domain-specific modeling languages is that engineers and scientists can describe problems declaratively at a high level of abstraction. That is, users can focus on what to model rather than how to create an efficient implementation. Although there are many benefits with well-engineered modeling languages, the development and maintenance cost can be very high. In this talk, I will give a brief overview of a new framework called Miking that aims to address these challenges. Specifically, I will discuss some key aspects of the framework, including language fragments composition, DSLs for probabilistic programming, and a new concept called statically resolvable ambiguity.
Bio:
David Broman is a Professor at the Department of Computer Science, KTH Royal Institute of Technology and an Associate Director Faculty for Digital Futures. He received his Ph.D. in Computer Science in 2010 from Linköping University, Sweden. Between 2012 and 2014, he was a visiting scholar at the University of California, Berkeley, where he also was employed as a part-time researcher until 2016. His research focuses on the intersection of (i) programming languages and compilers, (ii) real-time and cyber-physical systems, and (iii) probabilistic machine learning. David has received a Distinguished Artifact Award at ESOP (co-authored 2022), an outstanding paper award at RTAS (co-authored 2018), a best paper award in the journal Software & Systems Modeling (SoSyM award 2018), the award as teacher of the year, selected by the student union at KTH (2017), the best paper award at IoTDI (co-authored 2017), and awarded the Swedish Foundation for Strategic Research's individual grant for future research leaders (2016). He has worked several years within the software industry, and is a member of IFIP WG 2.4, Modelica Association, a senior member of IEEE, and a board member of Forskning och Framsteg.
Robots and other cyber-physical systems are held to high standards of safety and reliability. Formal verification can provide high assurance, but programming languages and techniques that lend themselves well to verification often do not produce executable code. We present MARVeLus, a stream-based approach to combining verification and execution in a synchronous programming language with refinement types, based on Zélus. Our approach allows formal guarantees to be made about implementation-level source code. We demonstrate the end-to-end process of developing a safe robotics application, from modeling and verification to implementation and execution.
One prominent example is the verification of collision avoidance between vehicles. When we do not assume that the vehicles are point masses, the verification conditions can often not be discharged fully automatically. We consider a convex polygonal vehicle with nonzero area traveling along a 2-dimensional trajectory, and we derive an easily-checkable, quantifier-free formula to check whether a given obstacle will collide with the vehicle moving on the planned trajectory. Our technique compares favorably to quantifier elimination in several case studies.
Bio:
Jean-Baptiste Jeannin is an Assistant Professor in Aerospace Engineering at the University of Michigan, interested in Formal Verification and Programming Languages, and their applications to providing strong guarantees to Aerospace, Robotics, and Cyber-Physical Systems. He received an Engineering degree from École polytechnique, and a Ph.D. in Computer Science from Cornell University in 2013.
In this presentation, I will introduce Scade One, the next generation of SCADE. I will present the high-level objectives and features of Scade One, with demo videos of the Technology Preview.
Functional programming permits us to express complex computational abstractions very efficiently while using types to force these abstractions to remain well-behaved. E.g., in second-order polymorphic lambda calculus (System F) we can define many standard inductive data-types so that well-typed programs using these data types are guaranteed to be deterministic and terminating.
It is natural to search for a “System F” for computational domains where not every data type is inductive. An important such domain is reactive data-flow programming which is based on the co-inductive domain of polymorphic streams. Here, affairs turn a twist. While determinism is still important it is the behavioural property of “progress” that we are concerned about: A reactive computation must not terminate but instead exhibit an infinite sequence of reaction cycles each of which should be finite (bounded) in memory and time. The type-free functional model of stream-processing by Kahn is very powerful but no "System K" is known that would uniformly guarantee progress for Kahn-style stream-programming like System F does for non-interactive programming.
However, interesting and practically useful type systems are known for special cases. This talk revisits the type system proposed by Pouzet et al. for the synchronous programming languages Lucid Synchrone/Velus. It extends the predicative typing system of Hindley-Milner for data by “clock types” and “causality types” as additional layers to control synchronisation. A well-typed program is then guaranteed to be progressive, i.e., deadlock-free and memory bounded, under a strictly synchronous model of scheduling. For practical applications within the given restricted class of synchronous programs this achieves its goals. However, it remains outside of a “System K” in the following sense: The typing system firstly exploits dependent types and secondly it uses separate type systems, specifically, one for eliminating deadlocks and another for eliminating unbounded memory. In the talk we will discuss a variation to avoid dependent types and to manage causality and initialisation in one and the same typing language, without (hopefully) losing expressiveness. The aim is to provide a more elementary reference calculus for teaching purposes and a simpler basis for discussion of approximations of what might eventually become to be identified as “System K”.
The PARKAS team and Inria Paris is organizing Synchron 2021 over five days at the end of November.
The Open Synchron Workshop is the annual meeting of experts and young researchers in the area of synchronous reactive programming. The main topics include executable specifications, program analysis, operational semantics, model-based design, formal methods, real-time programming, dedicated architectures, timing predictability, and controller synthesis.
The genuine format of the Open Synchron Workshop makes it unique in the modern landscape: there is no preliminary paper selection, and authors may present their new work in any available slot on a first-come/first-served basis. While the audience typically comprises a number of established researchers in the topic, newcomers are more than welcome. The cosy seminar atmosphere created by the venue in a single housing location is an opportunity for informal discussions and exchanges between researchers of differing seniority and expertise. Innovative contributions in blending reactive modeling and programming notions with neighboring research subjects and creative applicative domains, even at preliminary conceptual stages are always greatly appreciated.
The PARKAS team is organizing Synchron 2020 over two days at the end of November. Please join us online for 18 presentations from all over the world.
The Open Synchron Workshop is the annual meeting of experts and young researchers in the area of synchronous reactive programming. The main topics include executable specifications, program analysis, operational semantics, model-based design, formal methods, real-time programming, dedicated architectures, timing predictability, and controller synthesis.
The genuine format of the Open Synchron Workshop makes it unique in the modern landscape: there is no preliminary paper selection, and authors may present their new work in any available slot on a first-come/first-served basis. While the audience typically comprises a number of established researchers in the topic, newcomers are more than welcome. The cosy seminar atmosphere created by the venue in a single housing location is an opportunity for informal discussions and exchanges between researchers of differing seniority and expertise. Innovative contributions in blending reactive modeling and programming notions with neighboring research subjects and creative applicative domains, even at preliminary conceptual stages are always greatly appreciated.
PhD thesis defense.
This will be an online thesis defense. Please connect with GoToMeeting using your full name before the meeting is locked at 8:20am. It may be possible to join us physically at the ENS and there may also be a “pot” in the Cours Pasteur after the defense. We don't know yet. Watch this space (or contact Lélio)!
Specifications based on block diagrams and state machines are used to design control software, especially in the certified development of safety-critical applications. Tools like SCADE and Simulink/Stateflow are equipped with compilers that translate such specifications into executable code. They provide programming languages for composing functions over streams as typified by dataflow synchronous languages like Lustre.
In this thesis we present Vélus, a Lustre compiler verified in the interactive theorem prover Coq. We develop semantic models for the various languages in the compilation chain, and build on the verified CompCert C compiler to generate executable code and give an end-to-end correctness proof. The main challenge is to show semantic preservation between the dataflow paradigm and the imperative paradigm, and to reason about byte-level representations of program states.
We treat, in particular, the modular reset construct, a primitive for resetting subsystems. This necessitates the design of suitable semantic models, compilation algorithms and corresponding correctness proofs. We introduce a novel intermediate language into the usual clock-directed modular compilation scheme of Lustre. This permits the implementation of compilation passes that generate better sequential code, and facilitates reasoning about the correctness of the successive transformations of the modular reset construct.
The members of the thesis committee are
Computers are widely used in timing-sensitive applications such as controlling vehicles, factories, and medical devices. Today, real-time behavior of programs is a property that emerges from implementations rather than a property that is specified in models. Control over timing behavior of software is difficult to achieve, and timing behavior is neither predictable nor repeatable. I will argue that this problem can be solved by making a commitment to deterministic models that embrace temporal properties as an integral part of the modeling paradigm. I will explain how such models differ from the prevailing practice of modeling low-level computer architectures and estimating worst-case execution time. I will show moreover that deterministic timing is practical today without sacrificing performance for many useful workload models. Specifically, I will describe a class of computer architectures called PRET Machines that deliver deterministic timing with no loss of performance for a family of real-time problems consisting of sporadic event streams with hard deadlines.
This talk is part of the seminar series of the Informatics Department at the ENS.
The strymonas library (POPL 2017) is a high-performance stream code generation library (DSL) that supports any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping — of finite or infinite streams. Whatever the stream pipeline, strymonas delivers hand-written-like code — but automatically, assuredly and portably, with no reliance on black-box optimizers or sufficiently smart compilers. Not only is the strymonas-produced code free from intermediate data structures of unbounded size, in many cases, the code is free from any intermediate data structures whatsoever; the main loop can run without allocating any memory and without any use of GC.
We report on the follow-up work, which clarifies the semantic model of strymonas and views stream fusion as normalization-by-evaluation. As a result, we have reduced the number of core strymonas primitives and made them more general. In the original strymonas, when zipping two streams both containing filtering or flat-mapping, fixed-size intermediate data structures seemed inevitable. We have already demonstrated truly constant-space zipping of filtered streams. Zipping of two flat-mapped streams is the only case that is still undecided (as of this writing).
This is a joint work with Kobayashi Tomoaki.
PhD thesis defense.
In recent years the polyhedral model has emerged as a powerful mathematical framework for loop transformations in optimizing compilers. In this thesis, we study the applicability of the polyhedral model to the compilation of domain-specific languages, focusing on image processing and deep learning pipelines. We propose new techniques to enhance the state-of-the-art polyhedral compilation toolchain to achieve peak performance close to highly tuned libaries. We extend a polyhedral toolchain to handle reductions which are a key part of image processing and deep learning pipelines and often become the bottleneck if not parallelized. We also extend the Pluto algorithm to account for the spatial locality in addition to the temporal locality. We also propose a schedule specialization method for given problem sizes. Finally, we propose techniques to reduce the autotuning time from hours to minutes.
The members of the thesis committee are
PhD thesis defense.
English: Compilers looking for an efficient implementation of a function must find which optimizations are the most beneficial. This is a complex problem, especially in the early steps of the compilation process. Each decision may impact the transformations available in subsequent steps. We propose to represent the compilation process as the progressive refinement of a partially specified implementation. All potential decisions are exposed upfront and commute. This allows for making the most discriminative decisions first and for building a performance model aware of which optimizations may be applied in subsequent steps. We apply this approach to the generation of efficient GPU code for linear algebra and yield performance competitive with hand-tuned libraries.
Français: Les compilateurs cherchant à améliorer l'efficacité des programmes doivent déterminer quelles optimisations seront les plus bénéfiques. Ce problème est complexe, surtout lors des premières étapes de la compilation où chaque décision influence les choix disponibles aux étapes suivantes. Nous proposons de représenter la compilation comme le raffinement progressif d’une implémentation partiellement spécifiée. Les décisions possibles sont toutes connues dès le départ et commutent. Cela permet de prendre les décisions les plus importantes en premier et de construire un modèle de performance capable d'anticiper les potentielles optimisations. Nous appliquons cette approche pour générer du code d'algèbre linéaire ciblant des GPU et obtenons des performances comparables aux bibliothèques optimisées à la main.
The members of the thesis committee are:
Cryptographic primitives are subject to diverging imperatives. The necessity of functional correctness and auditability pushes for using a high-level programming language. The performance requirements and the threat of timing attacks push for using no more abstract than an assembler to exploit (or avoid!) the intricate micro-architectural features of a given machine. We believe that a suitable programming language can reconcile both views and actually improve on the state of the art in both directions. In this work, we introduce Usuba, an opinionated dataflow programming language in which block ciphers become so simple as to be “obviously correct” and whose types document and enforce valid parallelization strategies at the bit-level granularity. We then present Usubac, its optimizing compiler that produces high-throughput, constant-time implementations performing on par with hand-tuned reference implementations. The cornerstone of our approach is a systematization and generalization of bitslicing, an implementation trick frequently used by cryptographers. We thus show that Usuba can produce code that executes between 5% slower to 22% faster than hand-tuned reference implementations while gracefully scaling across a wide range of architectures and automatically exploiting Single Instruction Multiple Data (SIMD) instructions whenever the cipher’s structure allows it.
A real-time system is a computer system that controls a physical device in its environment, at a rate adapted to the evolution of the device. These systems are often critical, as a malfunction can lead to dramatic consequences. Therefore, real-time system developers must ensure that their implementations are safe, both functionally (producing the correct outputs) and temporally (producing outputs at the right time).
Developing a safe real-time system involves methods of two rather distinct research domains:
formal methods, formal languages and compilation, for the functional part;
real-time scheduling and Worst-Case Execution Time, for the temporal part.
The objective of this talk is to focus on one specific area where both domains meet, namely: real-time data-dependencies. A real-time system is typically programmed as a set of concurrent tasks, each with its own real-time constraints (periods, deadlines, ...). The talk will focus on the semantics, timing analysis, and implementation of inter-task data-dependencies in such programs.
La part croissante des fonctions d'assistance à la conduite, leur criticité, ainsi que la perspective d'une certification de ces fonctions, rendent nécessaire leur vérification et leur validation avec un niveau d'exigence que le test seul ne peut assurer. Depuis quelques années déjà d'autres domaines comme l'aéronautique ou le ferroviaire sont soumis à des contextes équivalents. Pour répondre à certaines contraintes ils ont localement mis en place des méthodes formelles. Le groupe PSA expérimente différentes techniques formelles afin de déterminer celles qui seraient pertinentes, pour quel type de développement, ainsi que l'impact de leur déploiement sur le processus. Cette présentation fait un tour d'horizon des techniques formelles expérimentées sur du code embarqué réellement utilisé en production, donne une synthèse des résultats obtenus et propose quelques perspectives pour l'avenir.
PhD thesis defense. The room is on the very top floor, see this map.
The work being proposed concerns parallelization and optimizations for two important classes of applications: sparse computations and iterated stencils, that appear frequently in scientific simulation, learning applications, image processing, etc.
Nowadays, optimizing compilers are increasingly challenged by the diversity of programming languages and heterogeneity of architectures. The polyhedral model is a powerful mathematical framework for programs to exploit automatic parallelization and locality optimization, playing an important role in the field of optimizing compilers. A long standing limitation of the model has been its restriction to static control affine programs, resulting in an emergent demand for the support of non-affine extensions. This is particularly acute in the context of heterogeneous architectures where a variety of computation kernels need to be analyzed and transformed to match the constraints of hardware accelerators and to manage data transfers across memory spaces. We explore multiple non-affine extensions of the polyhedral model, in the context of a well-defined intermediate language combining affine and syntactic elements. On the one hand, we explain how transformations and code generation for loops with non-affine, data-dependent and dynamic loop bounds are integrated into a polyhedral framework, extending the applicable domain of polyhedral compilation in the realm of non-affine applications. On the other hand, we describe the integration of overlapped tiling for stencil computations into a general polyhedral framework, automating non-affine transformations in polyhedral compilation. We evaluate our techniques on both CPU and CPU architectures, validating the effectiveness of the optimizations by conducting an in-depth performance comparison with state-of-the-art frameworks and manually-written libraries.
The members of the thesis committee are:
This is joint Parkas/Gallium seminar to be held at the Inria Paris offices
As machine learning becomes more powerful and pervasive, it has an impact on the many software systems that rely on it and on the underlying software and hardware platforms. The resulting questions should concern virtually all aspects of the theory and practice of software, as they relate to software engineering, security and trust, programming, tools, foundations, and more. This talk will discuss some of those questions. In particular, it will present recent research on the semantics of programming languages suitable for learning by gradient descent and on adversarial machine learning.
(This talk is a lightly updated version of one given at ETAPS 2018.)
Polyhedral techniques are an effective instrument in automatic loop tiling for data locality optimisation of sequential programs. In this presentation, I will provide an example of how these techniques can be applied to automatically adjust the granularity of work in a one-dimensional Gauss-Seidel kernel implemented in OpenStream, a task-parallel dataflow language. Next, I will extend these ideas to the more representative two-dimensional case and compare the performance of these polyhedral-tiled OpenStream programs against other implementations, e.g., a sequential C implementation, a parallel C implementation using OpenMP, and an implementation using less convoluted OpenStream routines. Preliminary results show that our polyhedral-tiled versions are significantly faster than manually tiled OpenStream implementations, and thus demonstrate the value of polyhedral tiling for dataflow programming models. Given enough parallelism, our implementations also outperform polyhedral-tiled OpenMP implementations, suggesting that both polyhedral optimisations and the use of a task-parallel dataflow model contribute to better performance.
Bio: Nuno Migeul Nobre is currently a postgraduate research student in the School of Computer Science at the University of Manchester, UK, where he is studying for a Ph.D. His research interests are focused on scheduling for heterogeneous multi-core architectures, languages and compilers for parallel computing and numerical algorithms. He also holds a master's degree in Physics Engineering, awarded in 2016 by the University of Coimbra, Portugal.
Scalable computations where the data is sparse — that is, a tiny subset
of the data is populated — are widely represented in scientific
computing and big data analytics.  Sparse data are typically represented
by sparse matrices and graphs, which reduce data storage and computation
requirements (e.g., by storing only nonzero data elements) through the
use of indirection matrices such as B in A[B[i]].  This indirection
impacts performance on modern architectures.  Whether the code is
optimized by human programmer, compiler or hardware, any optimizations
that must understand the memory access patterns for A require the values
of B, which are not known until runtime.  Over time, many optimizations
of sparse computations have been developed that simplify the code and
reduce the amount of indirection through custom data representations,
but these representations must also rely on runtime knowledge of the
nonzero structure of the sparse data.  In this talk, we describe a
framework for composing compile-time and runtime optimizations of sparse
codes, and demonstrate its effectiveness at achieving performance
comparable to manually-tuned code for applications in scientific
computing and data analytics.
Bio: Mary Hall is a professor in the School of Computing at University of Utah. She received a PhD in Computer Science from Rice University. Her research focus brings together compiler optimizations targeting current and future high-performance architectures on real-world applications. Hall's prior work has developed compiler techniques for exploiting parallelism and locality on a diversity of architectures: automatic parallelization for SMPs, superword-level parallelism for multimedia extensions, processing-in-memory architectures, FPGAs and more recently many-core CPUs and GPUs. Professor Hall is an ACM Distinguished Scientist and ACM’s representative on the Computing Research Association Board of Directors. She is deeply interested in computing history, having served on the ACM History Committee for a decade and as chair from 2009-2014. She also actively participates in outreach programs to encourage the participation of women and underrepresented minorities in computer science.
We propose a method for automatically finding refinement types of higher-order function programs. Our method is an extension of the ICE framework of Garg et al. for finding invariants. In addition to the usual positive and negative samples in machine learning, their ICE framework uses implication constraints, which consist of pairs (x, y) such that if x satisfies an invariant, so does y. From these constraints, ICE infers inductive invariants effectively. We observe that the implication constraints in the original ICE framework are not suitable for finding invariants of recursive functions with multiple function calls. We thus generalize the implication constraints to those of the form ({x1,...,xk}, y), which means that if all of x1, ... , xk satisfy an invariant, so does y. We extend their algorithms for inferring likely invariants from samples, verifying the inferred invariants, and generating new samples. We have implemented our method and confirmed its effectiveness through experiments.
Microfluidics is a multidisciplinary field at the intersection of engineering, physics, chemistry, biochemistry, nanotechnology, and biotechnology. It has practical applications to the design of integrated systems that process sub-microliter-scale volumes of fluid. Microfluidics has enabled the emergence of a Laboratories-on-a-Chip (LoCs), which integrate multiple laboratory functions into devices whose physical size ranges from square millimeters to square centimeters. Compared to traditional benchtop chemistry methods, LoCs offer the benefits of automation and miniaturization; software-programmable LoCs offer an important additional benefit: programmability.
This talk will introduce BioScript, a domain-specific language for programmable LoCs, and its compiler. Extensibility is particularly important for language design, as each LoC target features a unique set of capabilities, and there is no universal functionality among LoCs akin to Turing completeness. The BioScript compiler presently targets a specific class of semiconductor-based LoCs which manipulate discrete liquid droplets on a 2D electrode grid. The language, compiler, and runtime leverage advances in sensor integration to execute biochemical procedures that feature online decision-making based on sensory data acquired during assay execution. The compiler features a novel hybrid intermediate representation (IR) that interleaves fluidic operations with computations performed on sensor data. The IR extends the traditional notions of liveness and interference to fluidic variables and operations, as needed to target the LoC, which itself can be viewed as a spatially reconfigurable array. The code generator converts the IR into the following:
The compiler is validated using a software simulator which produces animated videos of realistic bioassay execution on the device.
Bio: Philip Brisk received, the BS, MS, and PhD Degrees, all in Computer Science, from UCLA in 2002, 2003, an 2006 respectively. From 2006-2009 he was a Postdoctoral Scholar at EPFL in Switzerland. He has been with UC Riverside since 2009. His research interests include programmable microfluidics and lab-on-a-chip technology, FPGAs and reconfigurable computing, and other forward-looking applications of computer engineering principles.
Designing and prototyping new features is important in industrial projects. I will present our experience using the Coq proof assistant as a prototyping environment for building a query compiler for use in IBM's ODM Insights. I will discuss the pros and cons of using Coq for this purpose and describe our methodology for porting the compiler to Java for product integration.
The initial Coq prototype has evolved into Q*cert, an open-source platform for the specification, verification, and implementation of query compilers. This platform includes support for SQL and OQL. It can generate code for execution on JavaScript, Spark and Cloudant. It internally relies on familiar database intermediate representations, notably the nested relational algebra and calculus. The platform also comes with a simple but functional and extensible query optimizers.
This is joint work with Joshua Auerbach, Martin Hirzel, Avi Shinnar and Jérôme Siméon.
This talk is part of the seminar series of the Informatics Department at the ENS.
Google's EXEgesis project takes a principled approach to generating code for modern processors. This talk will start with a refresher on modern computer architectures. We shall see why it is important to have precise instruction itineraries. We will then turn to the combinatorial optimization models for obtaining these itineraries. We will then touch on some gains that we've obtained so far.
The EXEgesis team is based in the Google Paris office. Many of us have a background in combinatorial optimization, and we are passionate about CPUs.
(NB: The salle W is well hidden. Please don't hesitate to come to the PARKAS area beforehand. We will leave together at 10:20.)
We focus on the automated synthesis of divide-and-conquer parallelism, which is a common parallel programming skeleton supported by many cross-platform multithreaded libraries. The challenges of producing (manually or automatically) a correct divide-and-conquer parallel program from a given sequential code are two-fold:
assuming that individual worker threads execute a code identical to the sequential code, the programmer has to provide the extra code for dividing the tasks and combining the computation results, and
sometimes, the sequential code may not be usable as is, and may need to be modified by the programmer.
We address both challenges in this paper. We present an automated synthesis technique for the case where no modifications to the sequential code are required, and we propose an algorithm for modifying the sequential code to make it suitable for parallelization when some modification is necessary. Finally, we present theoretical results for when this modification is efficiently possible, and experimental evaluation of the technique and the quality of the produced parallel programs.
(This work will appear in PLDI 2017.)
PhD thesis defense.
There will be a “pot-de-thèse” after the defense at 45 rue d'Ulm. Please RSVP to Robin by email if you will attend.
Modern multiprocessor architectures and programming languages exhibit weakly consistent memories. Their behaviour is formalised by the memory model of the architecture or programming language; it precisely defines which write operation can be returned by each shared memory read. This is not always the latest store to the same variable, because of optimisations in the processors such as speculative execution of instructions, the complex effects of caches, and optimisations made by compilers.
In this thesis we focus on the C11 memory model that is defined by the 2011 edition of the C standard. Our contributions are threefold.
First, we focused on the theory surrounding the C11 model, formally studying which compiler optimisations it enables. We show that many common compiler optimisations are allowed, but, surprisingly, some important ones are forbidden.
Secondly, building on our results, we developed a random testing methodology for detecting when mainstream compilers such as GCC or Clang perform an incorrect optimisation with respect to the memory model. We found several bugs in GCC, all promptly fixed. We also implemented a novel optimisation pass in LLVM, that looks for special instructions that restrict processor optimisations—called fence instructions—and eliminates the redundant ones.
Finally, we developed a user-level scheduler for lightweight threads communicating through first-in first-out single-producer single-consumer queues. This programming model is known as Kahn process networks, and we show how to efficiently implement it, using C11 synchronisation primitives. This shows that despite its flaws, C11 can be usable in practice.
The members of the thesis committee are:
PhD thesis defense. The room is on the very top floor, see this map. There will be a “pot-de-thèse” downstairs after the defense in Salle S16 in the Rataud Wing at the ENS. The manuscript and supporting materials are available online.
In this thesis we study embedded controllers implemented as sets of unsynchronized periodic processes. Each process activates quasi-periodically, that is, periodically with bounded jitter, and communicates with bounded transmission delays. Such reactive systems, termed quasi-periodic, exist as soon as two periodic processes are connected together. In the distributed systems literature they are also known as synchronous real-time models. We focus on techniques for the design and analysis of such systems without imposing a global clock synchronization.
Synchronous languages were introduced as domain specific languages for the design of reactive systems. They offer an ideal framework to program, analyze, and verify quasi-periodic systems. Based on a synchronous approach, this thesis makes contributions to the treatment of quasi-periodic systems along three themes: verification, implementation, and simulation.
Verification: The quasi-synchronous abstraction is a discrete abstraction proposed by Paul Caspi for model checking safety properties of quasi-periodic systems. We show that this abstraction is not sound in general and give necessary and sufficient conditions on both the static communication graph of the application and the real-time characteristics of the architecture to recover soundness. We then generalize these results to multirate systems.
Implementation: Loosely time-triggered architectures are protocols designed to ensure the correct execution of an application running on a quasi-periodic system. We propose a unified framework that encompasses both the application and the protocol controllers. This framework allows us to simplify existing protocols, propose optimized versions, and give new correctness proofs. We instantiate our framework with a protocol based on clock synchronization to compare the performance of the two approaches.
Simulation: Quasi-periodic systems are but one example of timed systems involving real-time characteristics and tolerances. For such nondeterministic models, we propose a symbolic simulation scheme inspired by model checking techniques for timed automata. We show how to compile a model mixing nondeterministic continuous-time and discrete-time dynamics into a discrete program manipulating sets of possible values. Each trace of the resulting program captures a set of possible executions of the source program.
The members of the thesis committee are:
Many computationally-intensive algorithms benefit from the wide parallelism offered by Graphical Processing Units (GPUs). However, the search for a close-to-optimal implementation remains extremely tedious due to the specialization and complexity of GPU architectures.
We present a novel approach to automatically discover the best performing code from a given set of possible implementations. It involves a branch and bound algorithm with two distinctive features: (1) an analytic performance model of a lower bound on the execution time, and (2) the ability to estimate such bounds on a partially-specified implementation.
The unique features of this performance model allow to aggressively prune the optimization space without eliminating the best performing implementation. While the space considered in this paper focuses on GPUs, the approach is generic enough to be applied to other architectures.
We implemented our algorithm in a tool called Telamon} and demonstrate its effectiveness on a huge, architecture-specific and input-sensitive optimization space.
This will be a practice run for Ulysse's presentation at CC 2017.
We show how partial redundancy elimination (PRE) can be instantiated to perform provably correct fence elimination for multi-threaded programs running on top of the x86, ARM and IBM Power relaxed memory models. We have implemented our algorithm in the backends of the LLVM compiler infrastructure. The optimisation does not induce an observable overhead at compile-time and can result in up-to 10% speedup on some benchmarks.
This will be a practice run for Robin and Francesco's presentation at CC 2017.
PhD thesis defense. The room is on the ground floor, at number 14 on this map.
In this thesis, we are interested in Kahn process networks, a simple yet expressive model of concurrency, and its parallel implementation on modern shared-memory architectures. Kahn process networks expose concurrency to the programmer through an arrangement of sequential processes and single-producer single-consumer channels.
The focus is on the implementation aspects. Of particular importance to our study are two parameters: lock freedom and relaxed memory. The development of fast and efficient lock-free algorithms ties into concerns of controlled resource consumption (important in embedded systems) and reliable performance on current and future platforms with unfair or skewed scheduling such as virtual machines and GPUs. Our work with relaxed memory models complements this more theoretical approach by offering a window into realistic shared-memory architectures.
We present a new lock-free algorithm for a Kahn process network interpreter. It is disjoint-access parallel: we allow multiple threads to work on the same shared Kahn process network, fully utilizing the parallelism exhibited by independent processes. It is non-blocking in that it guarantees global progress in bounded memory, even in the presence of (possibly infinite) delays affecting the executing threads. To our knowledge, it is the first lock-free system of this size, and integrates various well-known non-blocking techniques and concepts (e.g., safe memory reclamation, multi-word updates, assistance) with ideas and optimizations specific to the Kahn network setting. We also discuss a blocking variant of this algorithm, targetted at high-performance computing, with encouraging experimental results.
The members of the thesis committee are:
Even after thirty years, the polyhedral model is far from being an unequivocal success, even on the restricted domain where it is applicable. We do not (yet) have compilers for general-purpose processors that can generate code approaching either the machine peak, or the algorithmic peak of the source program, or hand tuned libraries developed by “heroes of high performance computing.” I will try to explain why this is so, and suggest a plan of how we can improve this state of affairs.
Graphics Processing Units (GPUs) are an attractive target for graph algorithms because they support massively parallel execution and possess much higher memory bandwidths than CPUs and FPGAs. However, graph programs are also highly irregular and low-level GPU programming models make writing high-performance implementations very challenging.
In this talk, I will describe the Galois GPU compiler that we have built to generate high-performance implementations of graph algorithms on GPUs. Beginning with a high-level representation of graph algorithms, which we call IrGL, our compiler generates optimized CUDA code for NVIDIA GPUs by applying three key optimizations that I argue are fundamental to high-performance implementations of graph algorithms on GPUs. Evaluation on eight graph applications shows that our compiler produces code that outperforms all current state-of-the-art handwritten CUDA implementations.
Graph program performance strongly depends on the inputs, and this makes picking the right set of optimizations for a given input very difficult. We show how our compiler can exploit queuing network models generated from IrGL source code to customize optimizations for combinations of applications and inputs.
Parts of this work will appear in OOPSLA 2016.
In this talk, I consider temporally causal functions of timed streams, i.e., functions that produce output values at a given instant that only depend on the input values that have been received till that instant. Defining partial timed streams that form a directed complete partial order (DCPO), I will show how causal functions are nicely captured as limits of continuous and synchronous functions over these DCPO offering thus a denotational model of causal timed functions. Relaxing the synchronous hypothesis to a pre-synchronous hypothesis, I will show how every causal function admits a lattice of possible denotations such that:
The categorical properties of these continuous functions will then be detailed, offering various relevant operators for programing with them. Lastly, I will show that a notion of causal function residual is available so that a minimal (possibly continuous) IO-automaton can be associated to every causal function, providing thus an abstract operational (latest) semantics. Open perspectives and questions will conclude this presentation.
Contract-based software development is a leading methodology for the construction of safety- and mission-critical embedded systems. Contracts are an effective way to establish boundaries between components and can be used efficiently to verify global properties by using compositional reasoning techniques. A contract specifies the assumptions a component makes on its context and the guarantees it provides. Requirements in the specification of a component are often case-based, with each case describing what the component should do depending on a specific situation (or mode) the component is in. This talk introduces CoCoSpec, a mode-aware assume-guarantee-based contract language for embedded systems built as an extension of the Lustre language. CoCoSpec lets users specify mode behavior directly, instead of encoding it as conditional guarantees, thus preventing a loss of mode-specific information. Mode-aware model checkers supporting CoCoSpec can increase the effectiveness of the compositional analysis techniques found in assume-guarantee frameworks and improve scalability. Such tools can also produce much better feedback during the verification process, as well as valuable qualitative information on the contract itself. I will presents the CoCoSpec language and illustrate the benefits of mode-aware model-checking on a case study involving a flight-critical avionics system. The evaluation uses Kind 2, a collaborative, parallel, SMT-based model checker developed at the University of Iowa that provides full support for CoCospec.
Real-time stream processing applications executed on embedded multiprocessor systems often have strict throughput and latency constraints. Violating these constraints is undesired and temporal analysis methods are therefore used to prevent such violations. These analysis methods use abstractions of the analyzed applications to simplify their temporal analysis.
Refinement theories have enabled the creation of deterministic abstractions of stream processing applications that are executed on multiprocessor systems. Prominent examples of such abstract models are deterministic timed-dataflow models which can be efficiently analyzed because they only have one behavior.
An important aspect of a stream processing application can be that it makes use of reordered data streams between tasks. An example is the bit-reversed ordered stream produced by a FFT task. Another example is reordering that is a result of data-parallelism. However, existing abstraction/refinement theories do not support such reordering behavior or do not handle this type of behavior correctly. This is because existing refinement theories assume that the temporal behavior of applications is orthogonal to their functional behavior, whereas this orthogonality does not always hold in the case of reordered data streams.
In this talk I will present a new refinement theory in which the potential interaction between temporal and functional behavior is taken into account. The introduced theory supports reordering of data and can therefore be used to validate existing systems with such reordering. Furthermore, the theory enables showing that deterministic dataflow models that do not apply reordering can be used as valid abstractions of systems in which reordering takes place.
Note 1: Marco will also serve on the thesis committee of Khanh Xuan Do, CEA List, NanoInnov, Monday October 17 at 14:30.
Note 2: we will continue with the seminar of Adrien Champion in the same room after a short coffee break.
Q*cert is a query compiler written in Coq. Initially dedicated to the compilation of jRule to Cloudant, it is now able to compile several source languages (e.g., OQL, NRA) and to generate code for Java or Spark. Internally, Q*cert uses 14 languages and 27 unitary translation functions to go directly from one language to another. In this talk, I will present how we handle all these compilation paths and how proofs help to maintain the options of the command line compiler.
Joint work with Joshua Auerbach, Martin Hirzel, Avraham Shinnar and Jérôme Siméon.
Q*cert est un compilateur de requêtes écrit en Coq. Initialement dédié à la compilation de jRule vers Cloudant, il est maintenant capable de compiler plusieurs langages sources (e.g., OQL, NRA) et de générer du code pour Java ou Spark. En interne, Q*cert utilise 14 langages et 27 fonctions de traductions unitaires permettant de passer directement d'un langage à un autre. Dans cet exposé, je présenterai comment nous faisons pour gérer tous ces chemins de compilation et comment les preuves nous aident à maintenir à jour les options de la ligne de commande du compilateur.
Travail réalisé en collaboration avec Joshua Auerbach, Martin Hirzel, Avraham Shinnar et Jérôme Siméon.
Writing code which uses an architecture at its full capability has become an increasingly difficult problem. For some key operations, a dedicated accelerator or a finely tuned implementation exists and delivers the best performance. When compiling code, identifying these operations and issuing calls to their high-performance implementations is thus attractive. In this talk, we focus on the problem of detecting such operations. We propose a framework which detects linear algebra subcomputations within a polyhedral program. The main idea of this framework is to partition the computation in order to isolate different subcomputations in a regular manner, then we consider each portion of the computation and try to recognize it as a combination of linear algebra operations.
We perform the partitioning of the computation by using a program transformation called monoparametric tiling. This transformation partitions the computation into blocks, whose shape is some homothetic scaling of a fixed-size partitioning. We show that the tiled program remains polyhedral while allowing a limited amount of parametrization: a single size parameter. This is an improvement compared to the previous work on tiling, that forced a choice between these two properties.
Then, in order to recognize computations, we introduce a template recognition algorithm built built on a state-of-the-art program equivalence algorithm. We also propose several extensions in order to manage some semantic properties.
Finally, we combine these two contributions into a framework which detects linear algebra subcomputations. A part of this framework is a library of templates based on the BLAS specification. We demonstrate our framework on several applications.
Achieving high performance is no easy task. From the early days of computing, many researchers have spent their lifetime trying to extract more FLOPS out of critical codes. Hardcore performance engineers try to get to this performance nirvana single handedly without any help from languages, compilers or tools. In this talk I'll argue that programming languages and compiler technology can take on most of the performance optimization burden... at least in certain domains. We will demonstrate the performance optimization ability of compilers and languages in the domains of image processing and physical simulation. Most these applications are performance critical, and programmers spend many months optimizing them. In the first part of this talk, I will describe Halide, a language and compiler for image processing. Halide explicitly separates the description of the algorithm, the task of the domain expert, from the choices of execution structure which determine parallelism, locality, memory footprint, and synchronization, the domain of the performance engineer. We show how this separation simplifies the problem for both the domain expert and the performance engineer. Using machine learning we can even eliminate the performance engineer entirely from the process. The end result is much simpler programs, delivering performance often many times faster than the best prior hand-tuned C, assembly, and CUDA implementations, while scaling across radically different architectures, from ARM cores to massively parallel GPUs.
In the second part of the talk, I will introduce Simit, a new language for physical simulations. Simit lets the programmer seamlessly work on a physical system both in its individual geometric elements as a hypergraph as well as the behavior of the entire system as set of global tensors. Simit provides a novel assembly construct that makes it conceptually easy and computationally efficient to move between these two abstractions. Using the information provided by the assembly construct, the compiler generates efficient in-place computation on the graph. We demonstrate that Simit is easy to use: a Simit program is typically shorter than a Matlab program; that it is high performance: a Simit program running sequentially on a CPU performs comparably to hand-optimized simulations; and that it is portable: Simit programs can be compiled for GPUs with no change to the program, delivering 4 to 20x speedups over our optimized CPU code.
Bio: Saman P. Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. Under Saman's guidance, the Commit group has developed the StreamIt, PetaBricks, StreamJIT, Halide, Simit, and MILK programming languages and compilers, DynamoRIO dynamic instrumentation system, Superword level parallelism for SIMD vectorization, Program Shepherding to protect programs against external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was the co-leader of the Raw architecture project. His research interests are in discovering novel approaches to improve the performance of modern computer systems and make them more secure without unduly increasing the complexity faced by the end users, application developers, compiler writers, or computer architects. Saman was the founder of Determina Corporation and a co-founder of Lanka Internet Services Ltd. Saman received his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D from Stanford University in 1990 and 1997, respectively.
We will present recent developments in Wagner, a functional programming language geared towards real-time digital audio processing with formal, machine-checkable semantics.
Wagner can be seen as either a very simple synchronous language of a functional reactive one, using coeffects for history tracking, allowing a natural coding of filter and digital waveguide oscillators.
In order to relate programs with their mathematical model three main components are needed:
A Coq library with some facts about the Discrete Fourier Transform and unitary transforms. The formalization is constructive and heavily relies on the Mathematical Components library.
An embedding of Wagner into Coq, crucially allowing a lightweight mechanization style as we can reuse most of Mathcomp's linear algebra libraries. The functional nature of Wagner allows the use of typical PL techniques such as for example logical relations to characterize the subset of Wagner programs that are “linear processors”.
Finally, a call-by-value abstract machine with a heap provides the efficient execution model. Basic ideas from the focusing literature are used to distinguish “positive” and “negative” types.
We embed the machine in Coq using the standard monadic CBV translation and we link it to the semantics in 2) by relating well-formed heaps with typing environments.
Search heuristics are of utter importance for solving Constraint Satisfaction Problems (CSP) efficiently. However, the implementers will struggle to choose between a full-fledged but low-level constraint library or high-level languages proposing a limited amount of pre-built heuristics. We introduce Spacetime Programming (STP), a paradigm based on concurrent constraint programming and synchronous languages, where both time and space are part of the language. The variables are characterized by a “spacetime” property describing how their values evolve through time and space. We will present preliminary work demonstrating how STP, in the context of CSP solving, can be used for writing and composing search strategies.
Equation-based object-oriented (EOO) modeling languages such as Modelica provide a convenient, declarative method for describing models of cyber-physical systems. Because of the ease of use of EOO languages, large and complex models can be built with limited effort.
However, current state-of-the-art tools do not provide the user with enough information when errors appear or simulation results are wrong. It is of paramount importance that such tools should give the user enough information to correct errors or understand where the problems that lead to wrong simulation results are located. However, understanding the model translation process of an EOO compiler is a daunting task that not only requires knowledge of the numerical algorithms that the tool executes during simulation, but also the complex symbolic transformations being performed.
As part of this work, methods have been developed and explored where the EOO tool, an enhanced Modelica compiler, records the transformations during the translation process in order to provide better diagnostics, explanations, and analysis. This information is used to generate better error-messages during translation. It is also used to provide better debugging for a simulation that produces unexpected results or where numerical methods fail.
Meeting deadlines is particularly important for real-time applications. It is usually essential to identify possible bottlenecks and either simplify the model or give hints to the compiler that enable it to generate faster code. When profiling and measuring execution times of parts of the model the recorded information can also be used to find out why a particular system model executes slowly.
Combined with debugging information, it is possible to find out why this system of equations is slow to solve, which helps understanding what can be done to simplify the model. A tool with a graphical user interface has been developed to make debugging and performance profiling easier. Both debugging and profiling have been combined into a single view so that performance metrics are mapped to equations, which are mapped to debugging information.
The algorithmic part of Modelica was extended with meta-modeling constructs (MetaModelica) for language modeling. In this context a quite general approach to debugging and compilation from (extended) Modelica to C code was developed. That makes it possible to use the same executable format for simulation executables as for compiler bootstrapping when the compiler written in MetaModelica compiles itself.
Bio: Martin Sjölund is doing a postdoc at Linköping University where he finished his PhD in 2015. The thesis is mostly focused on debugging and profiling of an equation-based object-oriented (EOO) language called Modelica. He has worked on various parts of the OpenModelica compiler since 2009, both doing research (resulting in the PhD) and development work for the Open Source Modelica Consortium (resulting in improved portability, performance, scalability, and new features in OpenModelica).
Context: part of the PhD thesis work of Oleksandr Zinenko.
The Polyhedral model is a powerful framework for program transformation, yet it remains complex for direct use, even by experts. We leverage the inherent geometric nature of the model to visualize programs and to support code transformation by directly manipulating these visualizations with automatic code generation and feedback on semantics preservation. Besides manual transformation, the user of our interactive tool, Clint, is able to recover, visually replay and adjust an automatically computed optimization using a set of syntactic-level directives with strong semantics. Combining this visualization with existing polyhedral tools introduces visual aid for schedule and transformation construction. Empirical user studies demonstrated that visual representations allow to extract and express parallelism faster and with more success.
Context: presentation of the PhD thesis work of Ivan Llopard; the thesis was prepared at CEA Leti and will be defended on April 26th in Grenoble.
Modeling languages propose convenient abstractions and transformations to harness the complexity of today's embedded systems. These languages are often based on the formalism of Hierarchical State Machines, such as the widely known Statecharts. While they naturally capture the concurrency of control systems, they face two important challenges when it comes to modeling data-intensive applications on modern parallel architectures: the absence of a unified approach that also accounts for (data) parallel actions; and the lack of an effective optimization and code generation flow. We propose a modeling language extended with parallel action semantics and indexed state machines suitable for computationally intensive applications. Together with its formal semantics, we present an optimizing compiler aiming for the generation of efficient parallel implementations. Following a model-driven methodology, the compilation flow has been successfully applied to manycore architectures such as GPGPU platforms.
Synchronous Dataflow graphs, an equivalent of weighted petri-nets introduced by Lee and Messerschmitt in 1987, is a well-known formalism commonly used to model data-exchanges between parallel programs. This model has been extensively studied in the last two decades because of the importance of its applications. However, the determination of a maximal throughput is a difficult question, for which no polynomial time algorithm exists up to now. Several authors proved that it exists a schedule which reaches the maximum throughput but with a no polynomially bounded size. On the other hand a strictly periodic schedule may be built polynomially, but without any guarantee on the throughput achieved. In this context, we investigate the trade-off between schedules size and their corresponding throughput. Additionally, the applicability of these models on SLAM algorithms is explored.
Les systèmes embarqués sont des dispositifs informatisés dédiés à une tâche spécifique, typiquement le contrôle/commande d'un processus physique. On en trouve dans le domaine des transports (freinage assisté, allumage moteur, commande de vol), du contrôle de processus industriel, ou de la production d'énergie. Les termes de systèmes réactifs, systèmes temps-réel (dur) couvrent à peu près la même réalité.
Une caractéristique principale de ces systèmes est la criticité : les défaillances peuvent être catastrophiques. Il faut donc s'assurer de la correction de ses systèmes avant leur mises en service.
Pour être considéré comme correct, un tel système doit essentiellement : - calculer juste : il réalise bien la fonctionnalité attendue, sans bogue de conception ni erreur à l'exécution ; - calculer (suffisamment) vite : la vitesse de réaction du système doit être en adéquation avec la dynamique du processus physique qu'il contrôle.
L'approche synchrone, formalisée dans le courant des années 80, s'intéresse spécifiquement aux aspects fonctionnels. Elle propose un cadre idéalisé pour la conception des systèmes (concurrence déterministe), et fournit les outils nécessaires pour la programmation, la validation, la compilation et l'implantation du logiciel embarqué. L'outil Scade (Esterel technologies) est un exemple typique d'un environnement de programmation synchrone, utilisé en production dans l'industrie (commandes de vol Airbus et bien autres).
L'analyse temporelle cherche à vérifier que le temps de réaction du système implanté est conforme aux contraintes temps-réel de son environnement. Aux méthodes dynamiques classiques, basées sur le test et la mesure, se sont ajoutées à partir des années 90 des méthodes d'analyse statiques qui cherchent à déterminer des bornes supérieures garanties aux temps de réaction. Le problème de base de ces méthodes est l'estimation du pire temps d'exécution d'un code séquentiel donné sur une architecture donnée (Worst Case Execution Time, ou WCET).
Cet exposé présente les complémentarités des deux domaines, puis aborde plus spécifiquement des travaux récents qui visent à analyser la sémantique des programmes synchrones pour découvrir des chemins d'exécutions infaisables sur le code compilé, et donc, potentiellement, d'affiner les estimations de pire temps d'exécution.
Dans le contexte des systèmes embarqués critiques, l'objectif principal du logiciel n'est pas d'effectuer des calculs extrêmement longs et complexes à partir d'une donnée qui lui est fournie initialement, mais de réagir de façon sure et déterministe aux évènements externes qui lui sont communiqués de manière régulière tout au long de son exécution. La réalisation de tels logiciels peut se faire en utilisant des langages de programmation dédiés dont la conception et les techniques de compilation sont un sujet de recherche actif depuis une trentaine d'années. Les langages de programmation synchrones, comme Esterel ou Lustre, ont prouvé leur utilité dans la réalisation des systèmes réactifs. Après une introduction aux principes des langages synchrones, cet exposé présentera à travers le langage de programmation Scade 6 (dialecte de Lustre) les travaux de recherche d'Esterel Technologies sur la réalisation et la compilation d'un langage textuel et graphique. Nous évoquerons les mécanismes particuliers à Scade qui ont été introduit ainsi qu'un travail en cours sur l'utilisation de Scade pour la simulation de systèmes physiques.
Finding a near-optimal implementation of a program on a GPU is a daunting task. The programmer or the compiler must not only explore regular code optimizations but also handle the placement of the data in the memory hierarchy and choose a parallelism scheme. The solution we propose explores a set of possible implementation to find the optimal one for a given input and GPU model. The problem with this approach is that the search space has an exponential size. We thus propose a performance model to trim the search space and drive the search.
The seminar is based on the work conducted during an internship at Google.
PhD thesis defense.
This thesis addresses the design and implementation of a programming language for real-time stream processing applications, such as video decoding. The model of Kahn process networks is a natural fit for this area and has been used extensively. In this model, a program consists in a set of parallel processes communicating through single reader, single writer queues. The strength of the model lies in its determinism.
Synchronous functional languages like Lustre are dedicated to critical embedded systems. A Lustre program defines a Kahn process network which is synchronous, that is, which can be executed with finite queues and without deadlocks. This is enforced by a dedicated type system, the clock calculus, which establishes a global time scale throughout a program. The global time scale is used to define clocks: per-queue boolean sequences indicating, for each time step, whether a process produces or consumes a token in the queue. This information is used both for enforcing synchrony and for generating finite-state software or hardware.
We propose and study integer clocks, a generalization of boolean clocks featuring arbitrarily big natural numbers. Integer clocks describe the production or consumption of several values from the same queue in the course of a time step. We then rely on integer clocks to define the local time scale construction, which may hide time steps performed by a sub-program from the surrounding context. These principles are integrated into a clock calculus for a higher-order functional language. We study its properties, proving among other things that well-typed programs do not deadlock. We compile typed programs to digital synchronous circuits by adapting the clock-directed code generation scheme of Lustre. The typing information controls certain trade-offs between time and space in the generated circuits.
The members of the thesis committee are:
Loosely Time-Triggered Architectures (LTTAs) are a proposal for constructing distributed embedded control systems. They build on the quasi-periodic architecture, where computing units execute ‘almost periodically’, by adding a thin layer of middleware that facilitates the implementation of synchronous applications.
We show how the deployment of a synchronous application on a quasi-periodic architecture can be modeled using a synchronous formalism. Then we detail two protocols, Back-Pressure LTTA, reminiscent of elastic circuits, and Time-Based LTTA, based on waiting. Compared to previous work, we present controller models that can be compiled for execution and a new version of the Time-Based protocol that neither requires broadcast communication nor imposes global synchronization. We also compare the LTTA approach with architectures based on clock synchronization.
Based on an EMSOFT 2015 presentation.
PhD thesis defense.
We are interested in enabling compilers to automatically optimize and parallelize code. The polyhedral framework is showing promising results in this area. In this thesis, we address three limitations in the polyhedral framework and we present a practical solution to each one of these limitations. First, we address the problem of tiling code that has false dependences. Compilers cannot tile such code without privatization or expansion. To avoid this problem, we propose a new relaxed permutability criterion that allows a compiler to ignore false dependences between iteration-private live ranges and thus enables the compiler to tile the loops without privatization or expansion. Second, we address the problem of compilation time. To reduce the long compilation time of large programs, we introduce a technique that we call statement clustering and in which the original representation of the program is transformed into a new program representation (where each group of statements is represented as a macro-statement). The new representation of the program can be optimized faster than the original representation. The third limitation is related to the ability of compilers to automatically parallelize irregular code and to generate efficient code when information is missing to the compiler. To address this limitation, we introduce PENCIL, an intermediate language for DSL compilers and for accelerator programming. PENCIL is a subset of the C99 language carefully designed to capture static properties essential to enable more precise data dependence analysis. It provides also a set of language constructs that enable parallelizing compilers to generate more efficient code. The use of PENCIL would enable the compiler to generate optimized code that is quite difficult to generate otherwise.
The members of the thesis committee are:
The Sequentially Constructive Language (SCL) is a minimal synchronous language that captures the essence of the Sequentially Constructive model of computation (SC MoC), a recently proposed extension of the classical synchronous model of computation. The SC MoC uses sequential scheduling information to increase the class of constructive (legal) synchronous programs. This should facilitate the adoption of synchronous programming by users familiar with sequential programming in C or Java, thus simplifying the design of concurrent reactive/embedded systems with deterministic behavior. In this presentation, I focus on how to compile SCL down to data-flow equations, which ultimately can be synthesized to hardware or executed in software.
The task of modelling and simulating a complex physical system is often a challenge to engineers and researchers since the applications of interest routinely involve several time-scales, multiple sub-systems strongly or loosely coupled, and components whose behaviours are very distinct. The simulation of such systems is done in most software by a single numeric solver, which is often an inelegant solution, compared to harvesting the results of a herd of specialised solvers, certainly more efficient in their respective tasks than a general purpose solver. This talk will explore a few leads to help shepherd several solvers on the same simulation. Emphasis will be placed on the role of 0th-order value holders in the computation, and on the compilation of a distributed simulation.
With the growing core count on chip multiprocessors, contention has become an important factor of performance degradation. Yet, little efforts have been devoted to understanding how it affects performance of data structures. Some were devoted to introducing different definitions of contention whereas others were devoted to the implementation of efficient data structures with terrible complexity. In the first part of this talk, I motivate the mismatch between some complexity metrics and performance measurements by presenting a technique to reduce contention in logarithmic data structures. In the second part of this talk, I show that definitions of contention can be considered equivalent in some context and illustrate this with an efficient concurrent linked list.
In this talk, we claim that structured parallel programming greatly simplifies the task of writing correct and efficient parallel programs, while also capturing the flexibility of expressing asynchronous computations. We support this claim by introducing a family of structured parallel programming primitives developed in the Habanero Extreme Scale Software Research project at Rice University, and discussing semantic guarantees (e.g., deadlock freedom, data race freedom, determinism, serial elision) for different subsets of parallel programs that can be constructed with these primitives. We will then summarize results for compiler optimizations, runtime systems, repair and datarace detection for these subsets. If time permits, we will briefly discuss future directions for data-driven repair and synthesis of parallel programs in the new Pliny project.
The cost efficient integration of different applications with different levels of safety and security on a single computing platform has become a major industrial challenge. Such a platform would preferably be an off-the-shelf shared memory multicore processor. The performance requirements of individual applications and components also push for their parallelization. Focusing on the safety-critical domain, we work on extending to multicore processors with the established modular design and implementation methods used for synchronous applications. In this talk, I will describe our approach to the modular composition of synchronous components with hard real-time constraints. Preserving time-predictability is the most difficult barrier to the adoption of multicores in safety-critical systems. Our solution is compatible with off-the-shelf embedded shared-memory multiprocessors, introducing minor modifications to the syntax and semantics of Heptagon, a synchronous programming dialect of Lustre. It is demonstrated on an industrial use case from Alstom Transport, involving a time-triggered software stack designed on top of Linux and bare-metal components running on a Xilinx Zynq multicore platform. Our solution involves an original methodology for the design of "mixed-time-critical" applications. The safety levels and correctness by construction of synchronous designs is preserved. Performance and time predictability can be traded off against each other. This involves the partitioning of an application into modes of different time criticality, and is supported by time-critical backup scenarios when best-effort components miss their deadline. This work takes place in the context of the EMC2 project.
Programming accelerators such as GPUs with low-level APIs and languages like OpenCL and CUDA is difficult, error prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide this complexity and to regain some performance portability. In this presentation, I will present PENCIL (Platform-Neutral Compute Intermediate Language) and present some details about how it is compiled. PENCIL is a rigorously defined subset of GNU C99 with specific programming rules and few extensions. Adherence to this subset and the use of these extensions enable compilers to exploit parallelism and to better optimize code when targeting accelerators. We intend PENCIL both as a portable language to facilitate accelerator programming, and as an intermediate language for DSL compilers. We validate the potential of PENCIL on a state-of-the-art polyhedral compiler, extending the applicability of the compiler to dynamic, data-dependent control flow and non-affine array accesses. We use the polyhedral compiler to generate highly optimized OpenCL code for a set of standard benchmark suites (Rodinia and SHOC), image processing kernels, and for two DSL compilers: a linear algebra (BLAS) DSL compiler and a DSL for signal processing radar applications (SpearDE). To assess performance portability, we present experimental results on many GPU platforms: AMD Radeon HD 5670, Nvidia GTX470, and ARM Mali-T604 GPU.
Based on a PACT 2015 presentation.
Greybus is the name for a new application layer protocol on top of Unipro that controls the Ara Phone from Google. This protocol turns a phone into a modular device, allowing any part of the system to be hotplugged while the phone is running. This talk will describe what this protocol is, why it was designed, and give the basics for how it works. It will discuss how this is implemented in the Linux kernel, and how it easily bridges existing hardware like USB, I2C, GPIO and others with little to no changes needed to existing kernel drivers.
This talk presents libkpn, a high-performance runtime system for task parallel programming, based on Kahn process networks (KPNs). Away from the usual perception of KPNs as a high-level programming model compiled into general-purpose languages, we explore how KPNs can be used directly as concurrency specifications to guide dynamic scheduling in a runtime system. Through a comparison with existing models, we explain how features of KPNs can be taken advantage of to allow more logical parallelism to be expressed, and to create an efficient implementation.
This talk addresses the well-known challenge involved in simultaneously delivering high productivity and high parallel performance on modern multicore architectures—through the development of domain-specific languages (DSLs) and their optimizing code generators. It presents the domain of Image Processing Pipelines as the motivating case by presenting PolyMage, our DSL and its code generator for automatic and effective optimization of image processing pipelines. PolyMage takes an image processing pipeline expressed in a high-level language (embedded in Python) and generates an optimized C/C++ implementation of the pipeline. We show how certain techniques including those based on the polyhedral compiler framework need to be specialized in order to provide significant improvements in parallel performance over existing approaches including manual, library-based, and that of another state-of-the-art DSL (Halide). More information at http://mcl.csa.iisc.ernet.in/polymage.html. Experimental results on a modern multicore system and a short demo will also be presented.
The synchronous model of programming, which emerged in the 1980s and has led to the development of well-known languages such as Statecharts, Esterel, Signal, and Lustre, has made the programming of concurrent systems with deterministic and bounded reaction a routine exercise. The synchronous approach draws its success from the fact that it is founded on a simple and well-understood mathematical model of computation, the Mealy automaton. However, the validity of this model is not for free. It depends on the Synchrony Hypothesis according to which a system is invariably faster than its environment. Yet, this raises a tangled compositionality problem. Since a node is in the environment of every other node, it follows that each node must be faster than every other and hence faster than itself!
This talk presents some results towards solving this paradox, refining the classical Mealy model by constructive notions of input-output causality. Specifically, we propose a game-theoretic interpretation of the constructive semantics of step responses for synchronous languages, providing a coherent semantic framework encompassing both non-deterministic Statecharts (as per Pnueli & Shalev) and deterministic Esterel. In particular, it is shown that Esterel arises from a finiteness condition on strategies whereas Statecharts permits infinite games. Beyond giving a novel and unifying account of these concrete languages the talk sketches a general theory for obtaining different notions of constructive responses in terms of winning conditions for finite and infinite games and their characterisation as maximal post-fixed points of functions in directed complete lattices of intensional truth-values.
The presentation has two parts:
          PART 1: AUTOMATED BUG DIAGNOSIS
          PART 2: VIRTUAL INSTRUCTION SET COMPUTING
I will present our work on two ongoing projects: (1) automated diagnosis of software failures, and (2) the benefits of using rich virtual instruction sets as a persistent representation of all software. If time permits, I will discuss a mythical connection between the two projects.
In the first part of the talk, I will describe new insights into analyzing the root causes of software failures automatically. We use dynamically observed likely invariants to identify candidate locations for the root cause of a failure, and then use novel static and dynamic analyses to filter out false positives and to extract valuable information relevant to understanding the failure. Experimental results show that we are able to narrow down the locations of root causes to just a few (e.g., 2-10) program locations, even in large programs, for nearly all bugs we have tested. We are now exploring how to automate the process fully using dynamic symbolic execution, and how to apply these fault isolation techniques for diagnosis and triaging of software failures in production software.
Next, I will discuss a project on Virtual Instruction Set Computing (VISC). Despite extensive advances in managed run-time systems and dynamic compilation, high performance applications and security sensitive systems software are almost universally compiled and shipped as native machine code. In the VISC project, we are exploring the security, reliability, and performance implications of shipping all such software in virtual instruction set form. We use the LLVM bitcode language as the shipping representation to enable advanced "lifelong" program analysis and transformation for such programs. We have shown that shipping operating systems in LLVM form enables powerful security guarantees to be enforced using compiler techniques. We are now exploring how the approach could be used to improve programmability and portability for heterogeneous parallel systems, and to enable novel performance optimization techniques.
De nombreux outils sont aujourd'hui disponibles pour l'analyse et la production temps réel de flux média temporisés : sons, vidéo, animations, etc. Néanmoins, la coordination de ces outils, la synchronisation des flux qu'ils analysent et produisent, sur des échelles de temps de valeurs et même de nature différentes, reste une affaire délicate.
Le modèle des Tiled Polymorphic Temporal Media (ou TPTM), qui combine en un même formalisme le contenu media de ces flux et leurs marqueurs de synchronisations, vise à remédier à cela. Dans le modèle, le produit de deux flux ainsi enrichi, paramétré par ces marqueurs de synchronisations, est tout à la fois séquentiel et parallèle : c’est un produit tuilé.
En théorie, la sémantique de ces flux tuilés peut être décrite dans la théorie des monoïdes inversifs. En pratique, nous avons réalisé, en Haskell, la première implémentation réellement polymorphe et inversive de ces Tiled Polymorphic Temporal Media. Notre implémentation permet en outre, via le mécanisme d’évaluation paresseuse d’Haskell, de distinguer simplement la syntaxe de ces flux – un système d’équations tuilées – de leur sémantique opérationnelle – la résolution de ce système à la volée.
Nous montrerons aussi comment nos constructions se généralisent simplement aux tuiles infinies.
Ce travail a été partiellement réalisé en collaboration avec Paul Hudak (Yale) et Théis Bazin (Stagiaire, ENS Cachan).
The IC3 procedure by A. Bradley is a recent model checking procedure for proving safety properties of finite-state systems that has shown to be quite effective in practice. This talk presents an SMT-based version of IC3 that works with both finite- and infinite-state systems. A distinguishing feature of our approach is the use of approximate quantifier elimination to generalize counterexamples. We provide experimental evidence that for linear integer arithmetic our method is superior in practice to full-blown quantifier elimination, and that our IC3 implementation is competitive with a state-of-the-art k-induction model checker for infinite-state systems. Our experiments also show that IC3 and k-induction have complementary strengths and so can be combined advantageously in a multi-engine parallel checker.
The field of model checking owes much of its great success and impact to the use of symbolic techniques to reason efficiently about functional properties of hardware or software systems. Traditionally, these techniques have relied on propositional encodings of transition systems and on propositional reasoning engines such as binary decision diagrams and SAT solvers. More recently, a number of these techniques have been adapted, and new ones have been devised, based instead on encodings into fragments of first-order logic supported by Satisfiability Modulo Theories (SMT) solvers. These are highly efficient solvers specialized on checking the satisfiability of formulas in certain logical theories such as the theory of linear arithmetic, bit-vectors, arrays, algebraic data types, and so on. SMT-based model checking methods blur the line between traditional (propositional) model checking and traditional (first or higher order) deductive verification. More crucially, they combine the best features of both by offering the scalability and scope of deductive verification while maintaining comparable levels of automations as propositional model checking. This talk will briefly introduce SMT and then give an overview of SMT-based model checking, focusing on a number of notable approaches and techniques.
FoCaLiZe is a programming language allowing to state in a same framework algorithms (programming aspects), properties (logical aspects) that these algorithms must satisfy and proofs these latter hold. It provides high-level constructs to ease development structuring and reuse (modularity, inheritance, late-binding, parameterization) and a proof language allowing to write proofs in a hierarchical way, using a natural deduction style, which rely on the Zenon automated theorem prover to discharge goals ... and the burden of the user. The core programming language is a pure functional language while properties are stated a first-order logic formulae. A FoCaLiZe development is compiled in two target languages: one OCaml source code leading to an executable (or object file) and one Coq source code containing the complete model of the development (i.e. definitions, properties and proofs) that will be ultimately checked by Coq as assessment. A crucial care in this process is to avoid any error from the target languages being returned to the user: a program accepted by the FoCaLiZe compiler must be accepted by both OCaml and Coq. To increase confidence in "what is ran is what is proved" despite two target languages, the compiler has to use a code generation model identical and traceable for both of them.
In this presentation we will introduce the motivations, design and features choices that impacted the language semantics. The presentation of the language features and the development style will also be addressed on the basis of examples, trying to illustrate advantages (and disavantages) of such a system compared to other existing formal proofs systems.
A functional intermediate language is a simple and effective formalization of static single assignment (SSA) for the purpose of verified compilation of an imperative language. Two translations are of particular importance: Translating from the imperative source language (e.g. C) to the functional language, and translating from the functional intermediate language to an imperative target language (e.g. assembly).
This talk discusses the above translations in the context of the intermediate language IL. IL comes with an imperative and a functional interpretation. IL/I (imperative interpretation) is a register transfer language close to assembly. IL/F (functional interpretation) is a first-order functional language with a tail-call restriction. We devise the decidable notion of coherence to identify programs on which the semantics of IL/I and IL/F coincide. Translations between the interpretations are obtained from equivalence-preserving translations to the coherent fragment. The translation from IL/I to IL/F corresponds to SSA construction. The translation from IL/F to IL/I corresponds to SSA deconstruction and involves register assignment.
The translations are implemented and proven correct using the proof assistant Coq. Extraction yields translation-validated implementations of SSA construction and deconstruction.
On modern multicore systems, the interaction between co-run programs largely depends on cache sharing, and cache sharing depends on the locality, i.e. the active data usage, in co-run programs. This talk introduces a newly developed locality theory based on a concept called the footprint. The theory shows the composability of footprint and the conversion between footprint and other locality metrics. It enables accurate characterization and analysis of the dynamics of cache sharing.
Designing and implementing thread-safe multithreaded libraries can be a daunting task as developers of these libraries need to ensure that their implementations are free from concurrency bugs including deadlocks. The usual practice involves employing software testing and/or dynamic analysis to detect deadlocks. Their effectiveness is dependent on well-designed multithreaded test cases. Unsurprisingly, developing multithreaded tests is easily an order of magnitude harder than developing single threaded tests for obvious reasons.
In this talk, we will address the problem of automatically synthesizing multithreaded tests that can induce deadlocks. The key insight to our approach is that a subset of the properties observed when a deadlock manifests in a concurrent execution can also be observed in a single threaded execution. We design a novel, automatic, scalable and directed approach that identifies these properties and synthesizes a multithreaded test that realizes these properties simultaneously revealing a potential deadlock. The input to our approach is only the library implementation under consideration and the output is a set of deadlock revealing multithreaded tests. We will present the results of our analysis on synthesizing multithreaded tests for Java libraries. We find 35 real deadlocks in thread safe classes in COLT, a popular high performance scientific computing library.
Multicore systems yield high performance by the execution of multiple concurrent software processes in the system in a power and area efficient way. While multicore systems are promising architectures for higher performance, the soft error reliability of these systems becomes a major concern due to smaller transistor sizes and low power operating modes.
My talk will include work during my Phd and recent work, related to reliability and performance analysis of multicore systems.
I am currently working on ARTEMIS PaPP project. PaPP (Portable and Predictable Performance on Heterogeneous Embedded Manycores) aims to achieve predictable performance portability of software running on different parallel platforms. We focus on performance analysis modelling for task-based parallel applications within multicore systems. Our modelling targets to find out minimal resources which satisfy performance requirements of an application and to develop task scheduling strategies as an input to run-time for better resource management and performance optimization.
In my PhD study, I was working on reliability analysis of multicore systems. I propose Thread Vulnerability Factor metric, to quantify vulnerability of thread and to qualify the relative vulnerability of parallel applications to soft errors, and utilize the metric for performance-vulnerability analysis of parallel programs, reliability-aware core partitioning schemes for multicore architectures, and critical thread identification for redundant execution in a partial fault tolerance method.
Mezzo is a typed programming language in the ML family whose static discipline controls aliasing and ownership. This rules out certain mistakes, including representation exposure and data races, and opens up new opportunities, such as gradual initialization or (more generally, and somewhat speculatively) the definition and enforcement of object protocols.
In this talk, I will explain the basic design of Mezzo on examples, and show how its type system gives a simple way of tracking ownership of fragments of shared memory between concurrent threads. Beyond the usual claim that "well-typed programs do not go wrong", we guarantee that well-typed Mezzo programs have no data-races.
The presentation will have two parts. In the first part, we will present an on-going research project to provide a run-time system with "guidelines", using task graph properties, and a static analysis phase to prioritize certain tasks of a given application over the others. The order in which tasks of an application task graph are considered has a significant influence on the resulting schedule length. Gauging the importance of the tasks with a priority scheme is therefore a fundamental part of scheduling schemes. We have used the concept of paths and path lengths to gauge the importance of tasks in an offline static analysis phase and provide guidelines to the run-time system for priority assignment.
In the second part of my presentation, we will present earlier research results that focuses on power/energy efficient real-time scheduling. We will discuss at least one such technique to elaborate the concept.
Cohabitation PharOS/EmbeddedC dans une application de contrôle/commande de moteur électrique pour du steer-by-wire avec retour d'effort Démonstration.