15 SWIG and Ocaml

This chapter describes SWIG's support of Ocaml. Ocaml is a relatively recent addition to the ML family, and is a recent addition to SWIG. It's the second compiled, typed language to be added. Ocaml has widely acknowledged benefits for engineers, mostly derived from a sophistocated type system, compile-time checking which eliminates several classes of common programming errors, and good native performance. While all of this is wonderful, there are well-written C and C++ libraries that Ocaml users will want to take advantage of as part of their arsenal (such as SSL and gdbm), as well as their own mature C and C++ code. SWIG allows this code to be used in a natural, type-safe way with Ocaml, by providing the necessary, but repetetive glue code which creates and uses Ocaml values to communicate with C and C++ code. In addition, SWIG also produces the needed Ocaml source that binds types, variants, functions, class, etc.

15.1 Preliminaries

SWIG 1.3 works with Ocaml 3.04 and above. Given the choice, you should use the latest stable release. The SWIG Ocaml module has been tested on Linux (PPC,MIPS,Intel,Sparc) and Cygwin on Windows. The best way to determine whether your system will work is to compile the examples and test-suite which come with SWIG. You can do this by running make check from the SWIG root directory after installing SWIG. The Ocaml module has been tested using the system's dynamic linking (the usual -lxxx agains libxxx.so, but not using the explicit dynamic linking provided by the Dl package http://www.ocaml-programming.de/packages/documentation/dl/ , although I suspect that it will work without a problem.

15.1.1 Running SWIG

The basics of getting a SWIG Ocaml module up and running can be seen from one of SWIG's example Makefiles, but is also described here. To build an Ocaml module, run SWIG using the -ocaml option. Enabling proxy classes -objects is also used. In order to disable the non-object interface, making methods only show up in the .mli as class methods, specify -onlyobjects.

%swig -ocaml -objects example.i

This will produce 3 files. The file example_wrap.c contains all of the C code needed to build an Ocaml module. To build the module, you will compile the file example_wrap.c with ocamlc or ocamlopt to create the needed .o file. You will need to compile the resulting .ml and .mli files as well, and do the final link with -custom (not needed for native link).

15.1.2 Additional Command Line Options

The following table lists the additional command line options available for the Ocaml module. They can also be seen by using:
swig -ocaml -help
Ocaml specific options
-mlout <ocaml-file.ml> Sets the name of the ocaml interface files to be generated. The .ml extension must be present.
-objects When on, produce ocaml class definitions for C/C++ classes, structs, unions.
-onlyobjects When on, produce only class methods for functions that appear in as class methods or member accessors.
-classmod
Wrap classes in Ocaml modules.  This is a way to disambiguate scoped enums, classes, etc. by planting them inside modules.  It also makes code that closely follows the layout of most Ocaml libraries as released.
-uncurried Wrap functions uncurried (with tuples). This was the way the module was originally written, but it's not as efficient in most cases. A case where it might be more efficient is the case of a list of tuples that match the call signature of the target function when this list is used with that function in List.map or List.iter.

15.1.3 Getting the right header files

You may need the libswigocaml.h file that comes with the distribution to be included. It provides several useful functions that almost all programs that use SWIG will need. It is located in $(prefix)/include/libswigocaml.h where $(prefix) is usually /usr/local, but could be /usr. This is set at configure time.

15.1.4 Compiling the code

Use ocamlc or ocamlopt to compile your SWIG interface like:

% ocamlc -c -ccopt "-I/usr/include/foo -I/usr/local/include" example_wrap.c
% ocamlc -c example.mli
% ocamlc -c example.ml

ocamlc is aware of .c files and knows how to handle them. Unfortunately, it does not know about .cxx, .cc, or .cpp files, so when SWIG is invoked in C++ mode, you must:

% cp example_wrap.cxx example_wrap.cxx.c
% ocamlc -c ... -ccopt -xc++ example_wrap.cxx.c
% ...

15.1.5 Current thoughts on best practice for Ocaml

Because the VC compiler (cl) needs link options specified after all compiler options, and ocamlc doesn't really understand that, I think that this is the best way to link ocaml code with C++ code. I formulated this method to make it easy for co-workers who rely on MSDev to create GUIs, etc.. to live in harmony with the ocaml parts of the application.

Let's say you have ocaml sources foo.ml and bar.ml and interface frob.i;

swig -c++ -objects frob.i
ocamlc -custom -c frob.mli
ocamlc -custom -c frob.ml
cp frob_wrap.cxx frob_wrap.c
ocamlc -custom -c -I$(FROBLIB)/include frob_wrap.c
ocamlc -custom -c foo.ml
ocamlc -custom -c bar.ml
ocamlc -pack -o foobar.cmo foo.cmo bar.cmo frob.cmo
ocamlc -custom -output-obj -o foobar.obj foobar.cmo

At this point, foobar.obj can be included in your MSVC project and linked against other code. This is how you link it:

link /OUT:big_program.exe \
  other1.obj other2.obj foobar.obj frob_wrap.obj \
  $(OCAMLLIB)/ocamlrun.lib $(FROBLIB)/lib/frob.lib

15.1.6 Using your module

You can test-drive your module by building a toplevel ocaml interpreter. Consult the ocaml manual for details.

When linking any ocaml bytecode with your module, use the -custom option to build your functions into the primitive list.

15.1.7 Compilation problems and compiling with C++

As mentioned above, .cxx files need special handling to be compile with ocamlc. Other than that, C code that uses class as a non-keyword, and C code that is too liberal with pointer types may not compile under the C++ compiler. Most code meant to be compiled as C++ will not have problems.

15.2 The low-level Ocaml/C interface

The SWIG Ocaml module is based upon the page in the Ocaml manual titled "Interfacing C with Objective Caml". You should familiarize yourself with this information if you need to write any special typemaps.

15.2.1 The generated module

The SWIG %module directive specifies the name of the Ocaml module to be generated. If you specified `%module example', then your Ocaml code will be accessible in the module Example. The module name is always capitalized as is the ocaml convention. Note that you must not use any Ocaml keyword to name your module. Remember that the keywords are not the same as the C++ ones.

15.2.2 Deleters

module gives the Ocaml is a garbage collected language.  You can choose to ignore this, and manage the C++ heap yourself, or, you can have Ocaml manage certain objects.  Since C++ code often requires objects to be owned by different parties at different times, the SWIG Ocaml programmer a choice at all times.  Each module is built with a function set_delete_fn : 'a -> string -> unit which will use caml_named_value (values registered with Callback.register) to get a function to use to delete the contents of the cell.  By default, all destructors are registered by their wrapper name, so delete_foo becomes "_wrap_delete_foo".  This is typical usage:

let x = new_foo ()
let _ = set_delete_fn x "_wrap_delete_foo" (* Foo is garbage collected *)
let y = new_foo ()
let z = new_managing_container ()
let _ = Managing_container.add y
    (* Y is not garbage collected because it wasn't set to be deleted *)
let _ = set_delete_fn z "_wrap_delete_managing_container"
    (* But z is garbage collected.  It will delete y *)

15.2.3 Types

The default typemaps are good generally, but have their weaknesses as all C type conversions must. In general, it isn't possible to predict the use that a C variable will be put to; since it's all just bytes in memory, any C variable can be used to hold any C value at least as small, and sometimes, even this is fudged. Also, pointer to object may mean pointer to array, or pointer to a single thing of that type. Some degenerate libraries even intermix enum and int freely, using enums as int constants, bit flags, or other int values. In addition, char * sometimes means opaque buffer and sometimes string. Given all of these factors, the following default type handling was chosen given the author's experience with C++. YMMV.
C type Default Ocaml Type
bool bool
void unit
int int
short int
long int64
unsigned long int64
char char
char * string
float float
double float
oc bool bool (* Can be used as a convenience in C code, typedef'd to int *)
unsigned int int32
unsigned short int
unsigned char char
long long int64
unisgned long long int64

When struct, class or union objects or references are used in function calls, or as results, Ocaml code pretends that they are used as pointers. This makes ocaml code easier to deal with both in terms of garbage collection, and in terms of uniformity. Because of this, user code never needs to enreference or dereference elements, although user code may need to cast pointer types, or on occasion, allocate a pointer variable which C/C++ code can store a value in. Functions are provided for this in libswigocaml, the SWIG Ocaml support library. As far as casts, the user will either provide an inline function that performs the cast, use the "%identity" primitive, or use the Obj.magic function in the ocaml library. Note that Obj.magic does no work except to pretend that the type of the argument is the same as the type needed for the expression, therefore, it's possible to crash the program this way (just as with a C cast).

In general, any C/C++ pointer type is represented by _p prepended, all types are prepended with _, and some more exotic types are encoded with different pseudo-symbols. You should check the .mli output to find the types assigned to various functions.

15.2.4 Functions

C/C++ functions are mapped directly into Ocaml functions. Parameters are passed tuppeled (enclosed in parenthesis, and separated by commas). Names are sometimes changed in order to make them into correct ocaml names. This usually involves adding an underscore in front of the name, but can mean adding a number to the end to break a conflict. You should read the .mli output before writing code based on SWIG output. Every possible effort is made to handle namespace and class names in an intelligent way that preserves the original name within the constraints of the ocaml system (ocaml functions can't be overloaded in the C++ sense, and can't start with an upper case letter, as well as needing to avoid the use of ocaml keywords).

15.2.5 Variable Linking

SWIG provides access to C/C++ global and member variables both as Ocaml functions, and as methods where applicable. In general, a mutable (modifiable) variable will have _get and _set methods like:
(* int foo; *)
val foo_get : unit -> int
val foo_set : int -> unit

and constants will have only a value binding, like:

(* const char *bar = "Yadda"; *)
val bar : string

since such a "variable" can't change and will never be set to anything else.

Member variables are accesses in the obvious way through methods of their containing classes.

15.2.6 Callbacks

The ocaml SWIG language module allows you to write callbacks that will be called from your C code. Currently, this feature is experimental. Consider the following code:
%module error
%{
void call_err( void (*errfunc)() ) {
errfunc();
}
%}

%feature("camlcb") caml_error {caml_error}
extern void caml_error();
void call_err( void (*errfunc)() );
This code will create a callback function called caml_error, and create a function pointer constant that enables you to provide an ocaml function to be called. If no ocaml function is provided using the given name, then an exception will be thrown.

If this code were built into a toplevel, you could write:

        Objective Caml version 3.04

# open Error ;;
# Callback.register "{caml_error}" (fun unit -> print_endline "hi"; flush stdout) ;;
- : unit = ()
# call_err caml_error ;;
hi
As you can see, this enables the C code to call your Ocaml code quite transparently.

15.2.7 A word about message loops

If you use native threads and message loops that can call into ocaml, ocaml code must originate any thread that can make a call back into the interpreter. I'm not sure if there's a way to register a non-ocaml thread with the interpreter as there is in the JNI. It can, however, be mitigated, by queueing or signalling notifications that a call made from an ocaml thread will retrieve.

15.2.8 Enums

SWIG will wrap enumerations as polymorphic variants in the output Ocaml code. Each variant has an `Int variant which is a catchall allowing degenerate C++ libraries mentioned above to work. Some functions which deal with enums as bit sets are available for each enum type. For an enum type foo, these are _foo_to_int, int_to_foo, foo_bits, check_foo_bit and bits_foo. Each of these performs some task transforming enum type values to integers, enum lists (representing bit sets), and ints or bit sets to enums. check_foo_bit allows the user to quickly check whether an enum value contains a superset of the bits in some indicated enum value.

As far as naming goes, polymorphic variant labels are an exception because they don't require any additional rules from C++, so they are simply prepended with '`' in the ocaml style.

Example:

%module enum_test
enum c_enum_type { a = 1, b, c = 4, d = 8 };

enum_test.mli:
type _c_enum_type =
[ `int of int
| `a
| `b
| `c
| `d
]

(* 1) The enum declaration itself.  Every enum is a polymorphic variant in order to make life simple.  This allows every enum to share the `int label, which allows that enum to carry an arbitrary int value. *)
external a_get  : unit -> int = "_wrap_a_get" ...
(* This is a function which retrieves the actual value of an enum label. *)
val a : _int ...
(* This is a convenience which holds the value of a_get () since it never changes. *)
val c_enum_type_to_int : _c_enum_type -> int
(* Given any _c_enum_type object, return the corresponding int.  This is useful when you want to encode an enum value as int and when you must pass the enum value as an int parameter. *)
val int_to_c_enum_type : int -> _c_enum_type
(* Given any int, return a corresponding _c_enum_type element.  If the int does not match any single enum label from the target enum type, then `int is returned containing the original value. *)
val c_enum_type_bits : _c_enum_type list -> _c_enum_type
(* Given a list of _c_enum_type elements, construct a _c_enum_type object with the logical or of them stored in it.  This is useful for cases where enum labels are used to denote different bits. *)
val check_c_enum_type_bit : _c_enum_type -> _c_enum_type -> bool
(* Given two enum elements, v and match, return true if every 1 bit in match is set in v.  Use this to conveniently check single bits, or bit expressions. *)
val bits_c_enum_type : _c_enum_type -> _c_enum_type_list -> _c_enum_type_list
(* Given a value of type _c_enum_type, and a list of _c_enum_type values, return a list containing every element in the input list for which check_c_enum_type_bit is true.  Use this to decompose an bitfield enum for use with a caml match .. with expression. *)

15.2.9 C++ Classes

C++ classes can currently be wrapped in three styles, selectable with the -objects and -onlyobjects options.  Objects are a fairly recent addition to the ML language family, as modules and functors were typically used for the same purposes as objects in the past.  Since C++ is object oriented, it is often convenient to pretend that C++ class pointers are real Ocaml objects, and call their methods, etc, as though they were.  Objects in Ocaml have drawbacks, however.  First; they are not compatible with code that compiles under caml light.  Second; they interact uniquely with the Ocaml type system in a way which does not please everyone.  Because of this, one may access objects in three ways;

Consider this example class:

class cpp_base {
public:
    int x;
    int f( float y );
};

class cpp_class_type : public cpp_base {
public:
    int g( float y );
};

15.2.9.1 No flags, function wrapping

  type _p_cpp_base
  external x_set : _p_cpp_base -> _int -> _void = "_wrap_x_set"
  external x_get : _p_cpp_base -> _int = "_wrap_x_get"
  external f : _p_cpp_base -> _float -> _int = "_wrap_f"
  external new_cpp_base : unit -> _p_cpp_base = "_wrap_new_cpp_base"
  external delete_cpp_base : _p_cpp_base -> _void = "_wrap_delete_cpp_base"
  type _p_cpp_class_type
  external g : _p_cpp_class_type -> _float -> _int = "_wrap_g"
  external new_cpp_class_type : unit -> _p_cpp_class_type = "_wrap_new_cpp_class_type"
  external delete_cpp_class_type : _p_cpp_class_type -> _void = "_wrap_delete_cpp_class_type"

This is the default code produced by SWIG for the above module. It is the lightest weight in terms of runtime and memory, as well as being uncomplicated by any type inference problems. Use this wherever it is convenient, as part of a functor, or where extra performance will be needed.

15.2.9.2 -objects

class cpp_base : _p_cpp_base -> object
(* Start superclasses *)
(* End superclasses *)
  method x_set : (_int) -> _void
  method x_get : _int
  method f : (_float) -> _int
  method cpp_base : _void
  method _self_cpp_base : _p_cpp_base
end
class cpp_class_type : _p_cpp_class_type -> object
(* Start superclasses *)
(* cpp_base is a superclass *)
  inherit cpp_base
(* End superclasses *)
  method g : (_float) -> _int
  method cpp_class_type : _void
  method _self_cpp_class_type : _p_cpp_class_type
end

In addition to the code above, the -objects flag asks SWIG to generate objects as well as functions to interface the C++ code.  While not perfect, this provides a good light-weight interface to a C++ object without hiding too much that you might need.
Note that a _p_cpp_base (pointer to a cpp_base object) and a cpp_base class are different.  This is so that Ocaml code needn't
construct an object if the user is only handling a pointer.

In order to extract the pointer from an object, use the _self_... method corresponding to the pointer type you want.  Note that only classes visible to the SWIG interface file are defined, and that every defined class in a hirearchy will be correctly inherited in Ocaml.  This makes it easy to use a deep C++ inheritance tree without complicated effort, and also allows any subtype to fill the role of its parent in an Ocaml expression involving objects.

15.2.9.3 -objects -onlyobjects

This flag combination outputs only the object definitions into the .mli file.  It reduces the amount of code emitted to the .mli file in order to make link info smaller, and to reduce the size of the interface file.

15.2.9.4 -classmod

This flag wraps class, struct, and union code in modules.  It may be used with -objects, but probably doesn't make a lot of sense that way.  The module name will be a capitalization of the class name, as is the ocaml convention.  Note that types which may need to be accessed outside of the module are defined outside at global scope (such as pointer types) since ocaml always applies scopes to types.  Types such as enums, that are defined in scope stay there.

15.2.10 Overloaded functions

Overloaded functions are disambiguated according to a simple naming rule which produces a unique, but not necessarily meaningful name.  These names are always produced in declaration order.  If you wish to extract and rename certain overloaded methods, use the %rename directive.

15.2.11 Operator overloading

Because operators are not polymorphic in Ocaml, operator overloading as used in C++ is not available in Ocaml, however, needed operators may be renamed with the %rename directive as above.

15.2.12 Ocaml typemaps

The previous section illustrated an "in" typemap for converting Ocaml objects to C.  Basic typemaps are provided for all of the basic Ocaml types, so string, int, float, etc. can be passed in and out of C functions without a problem.  Note that C++ functions that need a fixed length buffer may be provided with an Ocaml string.  Ocaml keeps the length of the provided character buffer, so binary data is fine to store in strings.

%typemap(out) int {
    $result  = Val_int($1);
}

One might wish to do specific typemaps that are beyond the common ones provided by the ocaml/typemaps.i file provided with the SWIG distribution.  Here are some addenda to the Ocaml document on C interfaces, along with some information about the SWIG Ocaml language module that will prove useful to the reader;

15.2.13 Exceptions

Please view the "Raising Exceptions" section of Interfacing C with Objective Caml.