Safe-C Programming Language

Tutorial for developers who already know C

Here's a Safe-C program :

// date.c from std use calendar, console; void main() { DATE_TIME now; get_datetime (out now); printf ("We are the %02d/%02d/%04d ", now.day, now.month, now.year); printf ("and it is %02d:%02d:%02d.\n", now.hour, now.min, now.sec); }

Compilation

As in C, a Safe-C program uses the file extensions .h for the interfaces and .c for the program bodies. The "make" is integrated in the compiler : when compiling you need only give to the compiler the name of the main .c file and it will follow automatically the include path of libraries (from std use xxx;) or of local files (use yyy;).

Compilation units

The .h and .c files of a component must always be stored in the same folder so that the compiler can find them. It's not necessary to import the .h file into the corresponding .c file, the compiler will do it automatically.

Example:

// data.h float global_delta = 1.0; // public variable const int MAX = 100; // public constant void insert (int element); // public function

// data.c int i = 0; int table[MAX]; public void insert (int element) // public function body { table[i++] = element; }

Contrarily to C, we don't use the keywords static and extern.
Variables declared in .h files need no longer be declared a second time in the .c file.
Instead of using the keyword static for functions that are intern to a file we will use the keyword public to declare functions that are visible outside a file.
Note that the keyword public is never used in .h files because everything in it is public anyway.

Here's an example of use of our component 'data' :

// main.c from std use console; use util/data; // component data is stored in the sub-folder "util" void main() { global_delta = 2.0; printf ("MAX = %d\n", MAX); insert (1); insert (element => 2); // call with explicit parameter name data.insert (3); // call with prefixed component name }

Initialisation of variables

All local variables must be initialized at their first use, including arrays and structures that must receive an initial full value. This can done with the instruction clear which replaces C's "memset (&v, 0x00, sizeof(v));", or by assigning a complete aggregate like :

void main() { int tab[3]; clear tab; // all elements to 0 tab = {all => 5}; // all elements to 5 tab = {5, 6, 7}; // assignment of a full aggregate }

Structures must likewise be initialized :

void main() { struct KEY { int nr; char c; } KEY k; clear k; // all elements to 0 k = {1, 'a'}; // simple aggregate k = {nr=>1, c=>'a'}; // aggregate with names }

Data Types

Here's a fast overview of all data types :

signed integers : int1, int2, int4, int8 and their aliases: tiny, short, int, long.
unsigned integers : uint1, uint2, uint4 and their aliases: byte, ushort and uint.
enumeration types: char, wchar, bool.
floating types: float and double.
array, struct, union, safe pointer(^), unsafe pointer(*), pointer to function, opaque, generic.

Arrays are declared like in C, with however a small difference :

void main()
{
  char     t1[10], t2[10];
  char[10] t1, t2;
}

The two declaration lines above are identical because what's specified on the left with the type applies to all identifiers on the right. It is allowed to combine both syntaxes.

You can declare an array type of unspecified length, for example string is predefined as :

typedef char[] string;

hence the following four declarations are identical :

void main()
{
  char[100]   buffer1;
  string(100) buffer2;
  char        buffer3[100];
  string      buffer4(100);
}

Parameters Modes

There are 3 parameter modes:

in (simple types are passed by value, arrays and struct by address)
ref (by address)
out (by address also)

Parameters of mode 'in' are read-only, they cannot be assigned a new value.
Parameters of mode 'ref' have no restrictions.
Parameters of mode 'out' are considered as non-initialized variables, they must receive a full value before the function ends.

void foo (int i, ref int j, out int k)
{
  k = i + j;
}

A function call looks like this, by repeating the mode :

void main ()
{
  int i, j, k;

  i = 1;
  j = 2;

  foo (i, ref j, out k);
}

So you can see that, contrarily to C, you don't use any & or * symbols.

Arrays are passed like this :

void foo1 (char[10] str);
void foo2 (char[] str);
void foo3 (string str);

Function foo1 accepts only arrays of char of length 10 : at execution time, only an address is passed on the stack.

Function foo2 accepts arrays of char of any length : at execution time, the address and the length are passed on the stack, so that the array length can be queried and checked within the function.

Function foo3 is equivalent to foo2 but more pleasant to read.

Attributes

The attribute 'length allows you to take the length of any array :

void main() { char tab[3]; int i; i = tab'length; // 3

Attributes 'min and 'max allow you to take the minimum/maximum of an integer type :

void main() { int i, petit, grand; petit = i'min; // -2_147_483_648 grand = i'max; // +2_147_483_647 }

Attributes 'first and 'last allow you to take the first/last value of an enumeration type :

void main() { enum COLOR {RED, GREEN, BLUE}; COLOR a, b; a = COLOR'first; // RED b = a'last; // BLUE }

Attribute 'string allow you to convert an enumeration value into a string representing its literal, which is useful when you pass these types in printf when debugging ...

void main() { enum COLOR {RED, GREEN, BLUE}; COLOR c = RED; printf ("c = %s\n", c'string); printf ("first color = %s\n", COLOR'first'string); }

Slices

A slice of an array is like a slice of bread. It has a beginning and a length :

string(5) s; string(2) t; s = "Hello"; t = s[3:2]; // copies "lo" (start=3, length=2) s[1:4] = "ELLO"; // keeps the H but changes the rest

Array indexes and slices are checked and generate a fatal error in case of illegal values.

Strings

The component 'strings' contains well-know functions : strcpy, strcat, sprintf, etc ..
It is noteworthy that, contrarily to C, the ending nul character is optional.
So if you use strcpy() to copy "Hello" into a string of length 5, there will be no ending nul character.

from std use strings; void main() { string(64) str, str2; int i = 2, j = 3, len; sprintf (out str, "value of i is : %d", i); sprintf (out str2, " and j equals : %d", j); strcat (ref str, str2); len = strlen (str); }

If you do a strcpy of a string of length 6 into a table of length 5, you will get a fatal error.

Constants

The following C declarations :

#define MAX 100 #define TITLE "programme.c"

will be written as follows in Safe-C :

const int MAX = 100; const string TITLE = "programme.c";

Jagged arrays

The following C declaration :

char *table[] = {"This", "is", "an", "example"};

will be written in Safe-C as :

const string table[] = {"This", "is", "an", "example"};

You can obtain the number of strings using table'length.

structures

Structures are declared almost like in C :

struct PERSON { char[20] name; int age; } PERSON per;

Furthermore, there exist special structures featuring a 'discriminant' of type enumeration :

enum TypeShape {POINT, SQUARE, CIRCLE, TRIANGLE}; struct Shape (TypeShape kind) { int x, y; switch (kind) { case POINT: null; case SQUARE: int side; case CIRCLE: int radius; case TRIANGLE: int base, height; } } Shape(SQUARE) s = {x=>1, y=>2, side=>3};

The size of the structure depends on the discriminant value when the variable is created. The compiler does never allocate the maximum length but only the length for the given variant.

Variant structures can be passed as parameters :

void foo1 (Shape(POINT) p) { // ... } void foo2 (Shape s) { switch (s.kind) { case POINT: // ... break; } }

Function foo1 will accept only a Shape of type POINT, whereas foo2 will accept any variant.
foo2 receives the discriminant s.kind in a hidden parameter so it knows which variant it is.

Packed types

packed struct PERSON { char[20] name; int age; } PERSON per;

The keyword packed tells the compiler not to align the fields of the structure. Consequently the structure becomes 'portable' and can be passed through an input/output function to the outside world (file, network, ..). A packed structure cannot contain a pointer ^ (otherwise you could read a random value from the outside world into a pointer and corrupt memory).
There's a rule that implicitely converts all packed types into byte arrays when passing them as parameter.

read() et write() being declared like this :

int read (int fd, out byte[] buffer); int write (int fd, byte[] buffer);

you can thus write this :

rc = read (fd, out per); // or rc = write (fd, per);

Moreover, any packed variable can be converted into a byte array using the attribute 'byte :

byte tab[4]; float f = 1.2; tab = f'byte; // copies 4 bytes

which allow copying the content of any variable to any other variable (unless the variable contains an unsafe pointer^, those being excluded from these conversions) :

int i; float f = 1.2; i'byte = f'byte; // copies 4 bytes i'byte[0] = f'byte[0]; // copies the first byte i'byte[2:2] = f'byte[2:2]; // copies the last 2 bytes

The type object

The type object is predefined as a byte array :

typedef byte[] object; // open array of byte

The type object[] is used in the declaration of functions having a variable number of parameters, like these :

int sprintf (out string buffer, string format, object[] arg); int sscanf (string buffer, string format, out object[] arg);

In the body of these functions, you can know the number of parameters by using arg'length, and each parameter is accessible through arg[i] and has type array of byte. Depending on the string 'format', it is then possible for the function to convert them to the desired type using the attribute 'byte.

References

A reference allows you to rename a variable into a shorter name.
In practice, the reference always stores the variable's address, and sometimes its length for an array.

ref string s = p^.line[i]^; printf ("%s\n", s);

Pointer types

A pointer is declared using the symbol ^. The keyword new allows you to allocate dynamic-size variables on the heap, with or without specifying an initial value.

There are simple types :

int^ p = new int; // object initialized to zero int^ p2 = new int ' (1); // explicit initialization to 1.

or

struct NODE { int nr; NODE^ next; } NODE^ p = new NODE; // object initialized to zero NODE^ p2 = new NODE ' {1, null}; // explicit initialization using aggregate NODE^ p3 = new NODE ' (p^); // initialized with value of another object

There are two types of array objects : those with constant length (that have a constant specified in the pointer declaration) :

int[3]^ p = new int[3]; // always points to an array of length 3. int[3]^ q = new int ' {1, 2, 3}; // same

and those with dynamic length (that have no length specified in the pointer declaration) :

int[]^ p = new int[3]; int[]^ q = new int ' {1, 2, 3};

please watch out for the difference : the last ones have a length field stored in the header of the heap object; they are not compatible with the first ones.

Last, there are structure with discriminant containing the value of the discriminant in a header of the heap object:

Shape^ p = new Shape(POINT); Shape^ q = new Shape(POINT) ' {x=>1, y=>2}; Shape^ r = new Shape ' (q^);

Implementation of pointer types

A pointer type ^ is secured by a 'tombstone' mecanism: each pointer points to an intern structure called Tombstone that contains the address of the real object allocated on the heap as well as a counter of references. This mecanism, handled in a thread-safe way, prevents any operation that could corrupt memory.

If the systeme is short on memory during a new the program will stop on a fatal error exactly as when your stack is full following too many recursive calls. It's up to you to manage your memory consumption. As in C, each memory block allocated with new must be freed after use with the instruction free.

free p; free q;

Using free on an object still referenced by any thread or already freed earlier by free will cause a fatal error.

On the other hand, the language will not notify you if you forget to call free, because that doesn't corrupt memory.

Pointers to functions

Pointers to functions exist like in C, without surprise. Here's an example :

void treat_node (Shape s); // function declaration typedef void TREAT (Shape s); // function pointer type void treatment () { TREAT treat; // function pointer variable treat = null; treat = treat_node; // parameter modes and types must match if (treat != null) treat (s); }

Unsafe pointers

To interface the libraries with the operating system, the old C pointers are available in Safe-C, for example the operator & can be used to take an object's address, an unsafe pointer can be indexed as in p[i], or taken a field of as in p->field, also the operators ++ and -- operate on unsafe pointers.
All this is however only available in an unsafe section :

#begin unsafe const string filename = "Test\0"; char *p = &filename; p++; #end unsafe

Threads

The operator run allows you to start a thread very easily.

void my_thread () { } void main() { int rc; rc = run my_thread (); // starts a thread (rc: 0=OK, -1=error) }

The function my_thread can have a maximum of one parameter.

Opaque types

Opaque types allow a very simple form of class in which the fields of a structure are only available in the .c file corresponding to the .h file where the opaque type is declared. Furthermore, all operations allowing to take a copy (clone) of the opaque type are disallowed.

// drawing.h struct DRAW_CONTEXT; // opaque type void init (out DRAW_CONTEXT d); void circle (ref DRAW_CONTEXT d, int x, int y, int radius);

// drawing.c struct DRAW_CONTEXT // full struct type { int x, y, dx, dy; IMAGE^ image; } public void init (out DRAW_CONTEXT d) { // .. } public void circle (ref DRAW_CONTEXT d, int x, int y, int radius) { // .. }

// main.c use drawing; void main() { DRAW_CONTEXT a, b; init (out a); b = a; // ERROR : assignment not allowed for limited types }

generic packages

Safe-C allows to declare generic packages so you can write algorithms that can be instantiated for a given type. This has the same effect as C's macros, except that the compiler doesn't just replace mecanically the generic by the actual type; all the package is syntactically checked.

Also, non-generic packages can be declared, as well as nested packages.

Here's an example of a bubble sort instantiated for the type int :

// bubble.h generic <ELEMENT> // generic type ELEMENT int compare (ELEMENT a, ELEMENT b); // return -1 if a<b, 0 if a==b, +1 if a>b package BubbleSort void sort (ref ELEMENT table[]); end BubbleSort;

// bubble.c package body BubbleSort public void sort (ref ELEMENT table[]) { int i, j; ELEMENT temp; for (i=1; i<table'length; i++) { for (j=i; j>0; j--) { if (compare (table[j-1], table[j]) <= 0) break; temp = table[j-1]; table[j-1] = table[j]; table[j] = temp; } } } end BubbleSort;

int compare_int (int a, int b) { if (a < b) return -1; if (a > b) return +1; return 0; } package Sort_int = new BubbleSort (ELEMENT => int, compare => compare_int); void main() { int table[5] = {2, 19, 3, 9, 4}; sort (ref table); // must be written Sort_int.sort if ambiguous }

A few more things

To close this chapter, here are some other short infos:

the type wchar on 16-bits means Safe-C supports UTF-16 so you can write chinese or japanese characters, and this works even in the source code, so you can write constants japanese strings L"".
the instruction assert b; can be used to check an assertion during compilation or at runtime;
the instruction abort; stops the program on a fatal error;
the instruction sleep n; allows to suspend a thread during some specified time. sleep takes an argument of type int or float in seconds.
the instruction _unused v; allows to specify that a variable is unused, to avoid a compiler warning;
in case of a fatal error, using the library unit 'exception' allows you to generate a file called CRASH-REPORT.TXT that allows the programmer to locate the error.

That's it, you know now the most important parts of the Safe-C programming language !

All the rest (operators, instructions) should be familiar to you if you already know C.