Please note: These pages are work in progress!

This document provides basic information on Soter and its web interface.

If you need instructions on how to use Soter, jump to the dedicated section below. For an introduction on how Soter works read on.

• Erlang documents the fragment of Erlang handled by Soter.

• Specs illustrates all the methods Soter offers for specifying properties of Erlang modules.

• ACS explains the abstract model derived by Soter from the input module.

• Papers is an up-to-date list of papers related to Soter. The theory behind Soter can be found in this paper.

# What is Soter?

Soter (Safety verifier fOr The ERlang language) is a fully-automatic program analyser and verifier for Erlang modules. Given an Erlang module and a safety property, Soter first extracts an abstract (approximate, sound) model and then checks if a corresponding property is satisfied using the model checker BFC.

The name Soter derives from the Ancient Greek name of the spirit of safety, preservation and deliverance from harm.

# What is Erlang?

Soter is designed to accept Erlang modules (with some limitations) as input.

Originally designed to program fault-tolerant distributed systems at Ericsson in the late 80s, Erlang is now a widely used, open-sourced language with support for higher-order functions, concurrency, communication, distribution, on-the-fly code reloading, and multiple platforms (see Armstrong, Virding, and Williams 1993; Armstrong 2010). Largely because of a runtime system that offers highly efficient process creation and message-passing communication, Erlang is a natural fit for programming multicore CPUs, networked servers, distributed databases, GUIs, and monitoring, control and testing tools.

The sequential part of Erlang is a higher-order, dynamically typed, call-by-value functional language with pattern-matching algebraic data types. Following the actor model (Hewitt, Bishop, and Steiger 1973), a concurrent Erlang computation consists of a dynamic network of processes that communicate by message passing. Every process has a unique process identifier (pid), and is equipped with an unbounded mailbox. Messages are sent asynchronously in the sense that send is non-blocking. Messages are retrieved from the mailbox, not FIFO, but First-In-First-Firable-Out (FIFFO) via pattern-matching. A process may block while waiting for a message that matches a certain pattern to arrive in its mailbox. For a quick and highly readable introduction to Erlang, see Armstrong's CACM article.

See erlang.org for the complete documentation and to download the Erlang system.

# What is BFC?

BFC is a fast coverability solver for Petri nets with transfer arcs, developed by Alexander Kaiser, University of Oxford.

In phase 3, Soter generates a Petri net in BFC's format which represents the abstract model of the system (see ACS). For each property Soter needs to prove, BFC is internally called, with the abstract model and a query representing the property as input.

# Verifying Erlang Programs Automatically

Concurrent programs are hard to write. They are just as hard to verify. In the case of Erlang programs, the inherent complexity of the verification task can be seen from several diverse sources of infinity in the state space.

($$\infty$$ 1) Function definitions are not necessarily tail-recursive, so a call-stack is needed.

($$\infty$$ 2) Higher-order functions are first-class values; closures can be passed as parameters or returned.

($$\infty$$ 3) Data domains, and hence the message space, are unbounded: functions may return, and variables may be bound to, terms of an arbitrary size.

($$\infty$$ 4) An unbounded number of processes can be spawned dynamically.

($$\infty$$ 5) Mailboxes have unbounded capacity.

The challenge of verifying Erlang programs is that one must reason about the asynchronous communication of an unbounded set of messages, across an unbounded set of Turing-powerful processes. Our goal is to verify safety properties of Erlang-like programs automatically, using a combination of static analysis and infinite-state model checking.

## Extraction of the Abstract Model

The starting point of our verification pathway is the abstraction of the sources of infinity ($$\infty$$ 1), ($$\infty$$ 2) and ($$\infty$$ 3). In the style of Van Horn and Might (Van Horn and Might 2010), we devise a machine-based operational semantics of Erlang that uses store-allocated continuations. The advantage of such an indirection is that it enables the construction of a machine semantics which is "generated" from the basic domains of Time, Mailbox and Data. In the paper, we show that there is a simple notion of sound abstraction of the basic domains whereby every such abstraction gives rise to a sound abstract semantics of Erlang programs. Further if a given sound abstraction of the basic domains is finite and the associated auxiliary operations are computable, then the derived abstract semantics is finite and computable.

We study the abstract semantics derived from a particular 0-CFA-like abstraction of the basic domains. However we do not use it to verify properties of Erlang programs directly, as it is too coarse an abstraction to be useful. Rather, we show that a sound abstract model, in the form of Actor Communicating System (ACS), can be constructed in polynomial time by bootstrapping from the 0-CFA-like abstract semantics. Further, the dimension of the resulting ACS is polynomial in the length of the input Erlang program. The idea is that the the 0-CFA-like abstract (transition) semantics constitutes a sound but rough analysis of the control-flow of the program, which takes higher-order computation into account but communicating behaviour only minimally. The bootstrap construction consists in constraining these rough transitions with guards of the form "receive a message of this type" or "send a message of this type" or "spawn a process", thus resulting in a more accurate abstract model of the input Erlang program in the form of an ACS.

## Actor Communicating Systems

Soter generates an abstract model that simulates the input Erlang program.

The abstract model is an Actor Communicating System (ACS) which models the interaction of an unbounded set of communicating actor processes. An ACS has a finite set of control states, a finite set of pid classes, a finite set of messages, and a finite set of transition rules. An ACS transition rule has the shape

$\pid : q \xrightarrow{\ell} q'$

which means that a process of pid class $$\pid$$ can transition from state $$q$$ to state $$q'$$ with (possible) communication side effect $$\ell$$, of which there are four kinds, namely,

1. the process makes an internal transition

2. it extracts and reads a message $$msg$$ from its mailbox

3. it sends a message $$msg$$ to a process of pid class $$\pid'$$

4. it spawns a process of pid class $$\pid'$$.

The various kinds of control states and transitions are represented pictorially as follows.

Explanation of elements of an ACS diagram

ACS models are infinite state: the mailbox of a process has unbounded capacity, and the number of processes in an ACS may grow arbitrarily large. However the set of pid classes is fixed, and processes of the same pid classes are not distinguishable.

There is a natural interpretation of ACS as vector addition systems (VAS), or equivalently Petri nets, using counter abstraction, as explained in the paper.

Soter produces both a graphical representation for the generated ACS, and an equivalent BFC model.

# Using Soter

## Workflow

Soter normally goes through three phases:

1. Parsing of input Erlang module;
2. Extraction of the abstract model;
3. Verification of the properties using BFC.

In phase 3, Soter will attempt to verify each of the specified properties in turn. The outcome of such attempts can be one of the following.

• Safe. The program satisfies all the properties Soter was required to prove.
• Unsafe. The program could be safe but Soter is unable to prove it.
• Error. The call to BFC failed; this could be because of a bug, or an unexpected output from BFC (must be version 1.0) or BFC aborted, maybe for lack of resources or by manual intervention.
• Timeout. When using the web interface, the timeout is set to 1 min for phase 2 and 1 min for each property in phase 3.

## Modes of Execution

Soter can be used in one of three ways:

1. "Analysis Only", to generate the abstract model (see ACS) which can then be viewed, or manually fed into BFC for verification.

2. "Verify Absence-of-Errors", to generate the abstract model and attempt to prove the absence of certain runtime errors; for details on which kinds of errors Soter can detect see [error-free].

3. "Verify Assertions", to generate the abstract model and attempt to prove each user-defined property specified via the methods outlined in the section Assertions.

The "Verify All" mode enables both 2 and 3.

### Assertions

We present here a short introduction to property specifications in Soter. For a detailed explanation refer to Specs.

A flexible and simple method for specifying properties of Soter consists in "labelling" and "constraining":

1. First, you determine which program locations or mailboxes your property involves. For example, mutual exclusion could involve a program location in the critical section of a function definition. Then you proceed by annotating your code with the ?label("LABEL") (for program locations) and ?label_mail("LABEL") (for mailboxes) functions.

2. Second, you add -uncoverable(PROP) directives before your definitions in the module; PROP has to be a string and it contains a sequence of constraints, separated by commas. Each constraint is in the form "LABEL" >= N where "LABEL" is one of the labels defined in step 1, and N is a positive natural number. Soter will then try to prove that no state that satisfies the constraints is reached.

When specifying properties, keep in mind that the constraints describe what are the bad states', not the good ones.

Note: Soter will ignore these annotations when in "Analysis Only" and "Verify Absence-of-Runtime-Errors" modes.

### Detection of Runtime Errors

When in "Verify Absence-of-Errors" mode, Soter will try to prove that no process will ever be in a state of error, i.e. an uncaught runtime exception. There is a limited set of such errors which are currently modelled by Soter:

• sending a message to a non-pid value (e.g. "I am a string" ! {msg, hi}). Note that the register primitives are not supported so sending to an atom is always seen as "illegal" by Soter;
• applying a function to the wrong number of arguments; higher-order is supported (even in messages) so this is not just a syntactic check;
• applying a value that is not a function;
• spawning a value that is not a function with zero arguments;
• explicit calls to erlang:error.

One major omission is pattern-matching errors. Soter is currently ignoring them. The next release of Soter will feature a more precise abstract pattern-matching mechanism that will allow for pattern-matching errors to be detected avoiding too many false-alarms.

Please note that the exception handling features of Erlang, such as catch, are not currently supported by Soter. We aim to add support for them in the next release.

## Refinement Of Abstractions

Soter does have limited support for data abstraction. A term is abstracted by truncation (i.e. ignoring all its subterms) at level L. There are two independent abstractions: one for values and one for messages, controlled respectively by the two truncation levels D and M.

The default parameter for value abstraction is D=0. If the property at hand is dependent on the values of a certain parameter, increasing D improves the chances of proving property but usually requires a lot more time.

If not otherwise specified, M is set to the maximum depth of the receive patterns plus D. You may need to manually adjust this parameter if your property needs to distinguish more kinds of messages. Increasing M is not expected to affect the performance too much.

Currently no other abstraction parameter (such as context-sensitivity k) can be adjusted.

## Non Determinism

Typically one would like to verify functions. However Soter requires the module to have a main function which has no arguments, acting as entry-point. This limitation can be overcome by using the non-deterministic choice operator soter:choice. Its runtime behaviour, provided by the soter.erl module, is a random choice between its two arguments. Soter interprets it as a non-deterministic branch in the semantics.

We discourage the direct use of soter:choice and favour instead the use of the macros defined in grammars.hrl, described in detail in the Erlang input chapter.

As a useful example, you can use the macro ?any_nat() to describe a value that ranges over all the natural numbers (to be used with the peano` switch, see Erlang input).

## Offline usage

Soter is a multi-platform command-line tool written in Haskell. We currently do not provide a direct download of the tool but the binaries can be obtained from the authors.

The manual page of the command-line tool gives a detailed description of all its options.

# Contacts

For further details, questions or bug reports contact

# References

Armstrong, J. 2010. “Erlang.” CACM 53 (9): 68. http://cacm.acm.org/magazines/2010/9/98014-erlang/fulltext.

Armstrong, J., R. Virding, and M. Williams. 1993. Concurrent programming in Erlang. Prentice Hall.

Hewitt, C., P. Bishop, and R. Steiger. 1973. “A Universal Modular ACTOR Formalism for Artificial Intelligence.” In IJCAI, 235–245.

Van Horn, D., and M. Might. 2010. “Abstracting abstract machines.” In ICFP, 51–62.