OCaml primer

Writing a workflow with bistro requires to learn a tiny subset of the OCaml language. This page aims at quickly presenting this subset, which should be sufficient to write basic pipelines. For the interested reader, I recommend the following easy introduction to the language and functional programming in general.

OCaml is a functional language, which means in brief that variables cannot be modified. The immediate consequence of this is that for or while loops are then pretty useless and are replaced by (possibly recursive) function calls. An OCaml program is a sequence of expressions (like 1 + 1) or definitions introduced by the keyword let. For instance, the program

let a = 1 + 1

defines a variable named a, which has value 2. This name can be reused in subsequent definitions, like in:

let a = 1 + 1
let b = 2 * a

A name cannot be used if it was not defined previously. If a name is used twice, the two definition coexist but only the last one is visible from the subsequent definitions. Hence, in the following program:

let a = 1 + 1
let b = 2 * a
let a = 1
let c = a

the variable c has value 1.

Getting started with the OCaml interpreter

While OCaml programs can be compiled into executables, it is also very convenient to enter programs interactively using an interpreter (similar to what exists for python or R). The OCaml language has a very nice interpreter called utop than can easily installed using opam. In a shell just type:

opam install utop

and then you can call utop on the command line. An interpreter like utop reads expressions or definitions, evaluates them and prints the result. Expressions or definitions sent to utop should be ended with ;; (in most cases they can be ommited in OCaml programs, but it doesn’t hurt to keep them in the beginning). For instance, let’s enter a simple sentence let a = 1;;. utop answers as follows:

     OCaml version 4.07.1

# let a = 1;;
val a : int = 1

The interpreter answers that we just defined a variable named a, of type int (the basic type for integers`` and equal to 1. Let’s enter other definitions to meet new basic data types, like strings:

# let s = "bistro";;
val s : string = "bistro"

booleans:

# let b = true;;
val b : bool = true

or floating-point numbers:

# let x = 3.14159;;
val x : float = 3.14159

To quit the interpreter, just press Ctrl+D

Functions

In OCaml, functions can be defined with the fun keyword. For instance, the expression fun x -> x + 1 denotes the function that given some integer returns the next integer. We can of course give the function a name like for any other value:

# let f = fun x -> x + 1;;
val f : int -> int = <fun>

Note that the interpreter “guessed” the type of f, as a function that takes an integer and returns an integer. This function can then be called using the following syntax:

# f 41;;
- : int = 42

In OCaml, the arguments of a function are just separated by spaces. In general we use a simpler (but equivalent) notation to define functions:

# let f x = x + 1;;
val f : int -> int = <fun>

Arguments can be named, in which case they are preceded by a ~ at the function definition and function calls:

# let f ~x = x + 1;;
val f : int -> int = <fun>
# f ~x:0;;
- : int = 1

Named arguments are very handy in that they can be given in any order; also they are a very effective way to document your code. A variant of named arguments are optional arguments, which may not be provided to the function.

Last, bistro API uses so-called polymorphic variants, which is a particular kind of values in OCaml. They are easy to spot because they are written with a leading backquote, like in:

# `mm10;;
- : [> `mm10 ] = `mm10
# `GB 3;;
- : [> `GB of int ] = `GB 3

The preceding snippet shows two basic usages of the variants: in the first one, they are used as a substitute to constant strings, the important difference being that the OCaml compiler will spot any typo at compile-time; the second usage is to wrap other values under a label that reminds of the meaning of the value. Here we define a memory requirement (3 GB), but instead of just representing it with an integer, we wrap it with the polymorphic variant to recall that this requirement is expressed in GB and not MB for instance.