xavier roche's homework

random thoughts and sometimes pieces of code

A Basic Glance at the Virtual Table

Virtual Functions, vtable And Other Monsters

C++ inheritance, virtual functions, and virtual tables are sometimes sees as obscure beasts, something most people will not mess with.

Casting an object to another one or calling a virtual function are probably two of the most common operations you can think of in C++, so it is a shame that such simple operations are not fully understood by everyone.

I have been programming in C for many years, and one of the benefits of C is that there is no real magic: when you define a structure, you can see the layout in memory, you can feel the pointers being padded, and every other operations, such as casting a pointer to another one (something you should only do with great care, by the way) are just re-interpreting memory addresses from one type to another one or shifting addresses a bit. No magic. No surprises. Accessing the second member of a pointer array ? This is just basic pointer arithmetic (indirect move with shifting and index, something that even the venerable 68000 processor had if my memory is correct)

Some people go straight in C++ (or even in Java) when they start learning programming – which is in my point of view a bad thing – and tend to lose their focus on what’s really happening behind the curtains (ie. inside the memory)

But things are generally much simpler that you thought.

Back To Basis

Let’s first have a look at a very simple object. A C++ one:

1
2
3
4
5
6
7
8
9
class A {
private:
  int id;

public:
  A(): id(42) {
    printf("yay, my address is %p, and my id is %d!\n", this, this->id);
  }
};

This class contains only a constructor, and has one member (an integer). These two things are not of the same nature: variable members and functions are of different kind (yes, this is a really smart and insightful remark, I know).

To make things clearer, we can view this class as a C-structure, and its functions as global functions taking a (hidden) “this” member as first argument (the pointer to the structure itself). The instance members can be accessed implicitly, without the this-> pointer indirection, by the way: C++ will add the missing bits.

The code above can be rewritten in pure C as follows:

1
2
3
4
5
6
7
8
9
10
// this is the actual "class A" instance data layout!
struct A {
  int id;
};

// this is the class A equivalent constructor (a global function)
void A_A(struct A* this) {
  this->id = 42;
  printf("yay, my address is %p, and my id is %d!\n", this, this->id);
}

… and, rather that just declaring:

1
  A a;

… you’ll need to call the constructor manually:

1
2
  struct A a;
  A_A(&a);  // call constructor manually!

Here it is! C++ in plain C! Of course a C++ compiler will hide everything behind the curtains when using regular classes, but the actual compiled code will be more or less the same.

If you add more members and/or more methods in this class, the principle is the same: extend the structure layout, and/or add more global functions that take an hidden this first argument.

Quizz: what is the size of a class A object instance ? The answer is pretty simple: the class only contains one member (id), which is an integer (32-bit here) ; the sizeof(A) is therefore 4 (bytes). No magic.

Quizz: what will be the size of a class A object instance if a regular member function is added ? The answer is pretty simple, too: its size won’t change, as it still contains only one member – only an additional global function will be added in the code.

What About Inheritance ? This Seems a Bit More Complicated!

No, no, not at all.

Let’s take a simple (yet representative) example: our previous A class, another B class with a single “age” member, and at last a final class C with a member “mode”, but inheriting from A and B.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class A {
public:
  int id;

  A(): id(42) {
    printf("A: yay, my address is %p, my size is %zu, and my id is %d!\n",
           this, sizeof(*this), this->id);
  }
};

class B {
public:
  int age;

  B(): age(7) {
    printf("B: yay, my address is %p, my size is %zu, and my age is %d!\n",
           this, sizeof(*this), this->age);
  }
};

class C: public A, public B {
public:
  int mode;

  C(): mode(-1) {
    printf("C: yay, my address is %p, my size is %zu, my id, age and mode are %d, %d, %d!\n",
           this, sizeof(*this), this->id, this->age, this->mode);
  }
};

Instantiating an object from C (C c;) will print something like:

1
2
3
A: yay, my address is 0x73620f9f5c20, my size is 4, and my id is 42!
B: yay, my address is 0x73620f9f5c24, my size is 4, and my age is 7!
C: yay, my address is 0x73620f9f5c20, my size is 12, my id, age and mode are 42, 7, -1!

The class C constructor called first the upper inherited classes constructors (class A constructor and class B constructor), and therefore prints their information first. Here again, C++ hides everything behind curtains: the calls to base classes constructors are generated automatically, before your actual class C constructor code.

You can note that the address seen by the constructors of A and C are identical, but not the one seen by B. How can it be ?

Let’s rewrite everything in plain C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
struct A {
  int id;
};

// class A constructor
void A_A(struct A* this) {
  this->id = 42;
  printf("A: yay, my address is %p, my size is %zu, and my id is %d!\n",
         this, sizeof(*this), this->id);
}

struct B {
  int age;
};

// class B constructor
void B_B(struct B *this) {
  this->age = 7;
  printf("B: yay, my address is %p, my size is %zu, and my age is %d!\n",
         this, sizeof(*this), this->age);
}

struct C {
  struct A a;
  struct B b;
  int mode;
};

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, user code
  this->mode = -1;
  printf("C: yay, my address is %p, my size is %zu, my id, age and mode are %d, %d, %d!\n",
         this, sizeof(*this), this->a.id, this->b.age, this->mode);
}

The struct C layout is pretty straightforward: the first element is the struct A members – the reason why constructors from class A and class C did see the same address, the second member is the member(s) of class B, and finally comes the member(s) of C.

In both C++ and C variants,

  • the size of C is 12 bytes (the three integers)
  • the location of A within C is at the beginning (the first four bytes)
  • the location of B within C is after the area dedicated to A (the next four bytes)
  • the location of members specific to C then follows

Casting will provide the same behavior, by the way:

  • static_cast<A*>(&c)) == &(&c)->a == 0x73620f9f5c20
  • static_cast<B*>(&c)) == &(&c)->b == 0x73620f9f5c24
  • static_cast<C*>(&c)) == &c == 0x73620f9f5c20

Okay, But Virtual Functions ? Can You Do That In Plain C ?

Well, let’s consider a simple example: simply add a virtual function print to our previous sample, aiming to print the object class name and parameters. The idea is that even if we cast the high-level class C instance into A* or B*, we’ll see the same behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class A {
public:
  int id;

  A(): id(42) {
    printf("A: yay, my address is %p, my size is %zu, and my id is %d!\n",
           this, sizeof(*this), this->id);
  }

  virtual void print() {
    printf("I am A(%d)\n", id);
  }
};

class B {
public:
  int age;

  B(): age(7) {
    printf("B: yay, my address is %p, my size is %zu, and my age is %d!\n",
           this, sizeof(*this), this->age);
  }

  virtual void print() {
    printf("I am B(%d)\n", age);
  }
};

class C: public A, public B {
public:
  int mode;

  C(): mode(-1) {
    printf("C: yay, my address is %p, my size is %zu, my id, age and mode are %d, %d, %d!\n",
           this, sizeof(*this), this->id, this->age, this->mode);
  }

  virtual void print() {
    printf("I am C(%d, %d, %d)\n", id, age, mode);
  }
};

If we create an instance of C called c (C c;), the following calls will produce exactly the same output:

  • c.print();
  • static_cast<A*>(&c)–>print();
  • static_cast<B*>(&c)–>print();
  • static_cast<C*>(&c)–>print();

Where is the magic ? Let’s first have a look at the output, when creating this object:

1
2
3
A: yay, my address is 0x726fa25b7c00, my size is 16, and my id is 42!
B: yay, my address is 0x726fa25b7c10, my size is 16, and my age is 7!
C: yay, my address is 0x726fa25b7c00, my size is 32, my id, age and mode are 42, 7, -1!

We can see that A and B are not anymore 4 bytes, but 16. And C is 32 bytes! Somehow, the C++ compiler did add some bytes in our objects without telling us!

Let’s have a look at each object instance actual contents within each classes, by dumping data as pointers in constructors, to figure ou what has been added inside our beloved structures classes:

1
2
3
4
5
6
7
8
9
// simple dump function
static void print_object(const char *name, void *this_, size_t size) {
  void **ugly = reinterpret_cast<void**>(this_);
  size_t i;
  printf("created %s at address %p of size %zu\n", name, this_, size);
  for(i = 0 ; i < size / sizeof(void*) ; i++) {
    printf("  pointer[%zu] == %p\n", i, ugly[i]);
  }
}

And using print_object(__FUNCTION__, this, sizeof(*this)); inside each constructor.

Remember that the constructors are called as follows when creating an instance of C:

  • base class A constructor is called magically (this points to the beginning of the allocated C structure)
  • base class B constructor is called magically (this starts at 16 bytes from the beginning of the allocated C structure)
  • top level C constructor user code is executed (this points to the beginning of the allocated C structure)

Here’s the result:

1
2
3
4
5
6
7
8
9
10
11
created A at address 0x7082f962ccb0 of size 16
  pointer[0] == 0x400f20
  pointer[1] == 0x2a
created B at address 0x7082f962ccc0 of size 16
  pointer[0] == 0x400f00
  pointer[1] == 0x7
created C at address 0x7082f962ccb0 of size 32
  pointer[0] == 0x400ed0
  pointer[1] == 0x2a
  pointer[2] == 0x400ee8
  pointer[3] == 0xffffffff00000007

We can see that:

  • A contains an unknown additional pointer (0x400f20), and the id variable (0x2a == 42) padded to 8 bytes
  • B contains another unknown additional pointer (0x400f00), and the age variable (7) padded to 8 bytes
  • C embeds A and B, and the mode variable (-1): these are seen as 0x2a, and 0xffffffff00000007 (00000007 for the age variable, and ffffffff for the mode variable: -1 is 0xffffffff in two complement’s world) – and the two additional pointers

So it seems that the structure size change reason is that an hidden pointer member (8 bytes in the 64-bit world) has been added to the class instance – and the compiler probably padded the structure to 16 bytes, because you want to have 8-byte alignment when dealing with arrays: if you have an array of two objects, the second object hidden pointer has to be padded to 8 bytes, enforcing the structure to be a multiple of 8 bytes. The 4 bytes used for padding are unused, but useful to maintain the alignment.

Hey, wait! The two pointers are different, when printed inside the C constructor! Yes, good catch: the additional A pointer which was 0x400f20 inside the class A constructor has been changed into 0x400ed0 inside the class C constructor, and the additional B pointer which was 0x400f00 inside the class B constructor has been changed into 0x400ee8 inside the class C constructor.

Oh, and a last note: C is only 32 bytes.

  • there is no additional hidden pointer for C!
  • the compiler apparently used the padding area of class B for the mode variable (smart, isn’t it ?)

Okay, so two magical pointers have been added because we defined a virtual function. What does it mean ?

Let’s go back to basic: how would you implement this class hierarchy in pure C ?

Here’s what you may do:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
struct A {
  int id;
};

// class A constructor
void A_A(struct A *this) {
  this->id = 42;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void A_print(struct A *this) {
  printf("I am A(%d)\n", this->id);
}

struct B {
  int age;
};

// class B constructor
void B_B(struct B *this) {
  this->age = 7;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void B_print(struct B *this) {
  printf("I am B(%d)\n", this->age);
}

struct C {
  struct A a;
  struct B b;
  int mode;
};

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, user code
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

Okay, looks more or less what we had previously. But what about the virtual function ? We have to be sure that we call the right function, even if we only have a A* or a B* pointer (and we don’t know that this is actually a C object class behind).

A solution can be to add a function pointer for every virtual function in each classes, and initialize inside the constructor!

Something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
struct A {
  void (*print)(struct A *this);
  int id;
};

// class A constructor
void A_A(struct A *this) {
  this->print = A_print;
  this->id = 42;
  print_object(__FUNCTION__, this, sizeof(*this));
}

struct B {
  void (*print)(struct B *this);
  int age;
};

// class B constructor
void B_B(struct B *this) {
  this->print = B_print;
  this->age = 7;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void B_print(struct B *this) {
  printf("I am B(%d)\n", this->age);
}

struct C {
  struct A a;
  struct B b;
  void (*print)(struct C *this);
  int mode;
};

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, user code
  this->print = C_print;
  this->a.print = C_print;  // we patch our own A->print function !!
  this->b.print = C_print;  // and we need to do it for B->print too!!
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

What did we do here ?

  • A and B constructors initialize their versions of print
  • C constructor initialize its own version of print after A and B constructors, and overrides the one inside A and B

Neat, isn’t it ?

Now, whatever pointer you have, these three lines will produce identical behavior:

  • a->print(a);
  • b->print(b);
  • c->print(c);

Hey, by the way, we do not need an additional pointer for C, because we already have the correct version in two locations: a.print and b.print. Let’s remove this useless pointer in C (and the useless initialization), and use:

  • a->print(a);
  • b->print(b);
  • c->a.print(c);

There is a little problem, however, when using a C instance cast in A* or B*. When you call a->print(a), the a pointer is of type A*. And therefore the assignment this->a.print = C_print will produce a warning. Similarly, when you call b->print(b), the this pointer will have a type of B*, and not only it is of the wrong type, but it has a different address, as c->b is located sixteen bytes after the beginning of C.

We need specialized versions with correct casts!

  • casting from A* to C* is straightforward: we know that the address is the same, and the type can be safely cast
  • casting from B* to C* requires to adjust the address to find the beginning of C, and a cast (B is in the middle of C, remember)

Here we go:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

void CA_print(struct A *this) {
  C_print((struct C*) this);
}

void CB_print(struct B *this) {
  // we know that this points after the struct A layout
  // so let's fix the pointer by ourselves!
  size_t offset = (size_t) &((struct C*) NULL)->b;
  C_print((struct C*) ( ( (char*) this ) - offset ) );
}

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, override functions
  this->a.print = CA_print;  // we patch our own function !!
  this->b.print = CB_print;  // and we need to do it for B too!!
  // this->print = C_print;  // useless
  // then, user code
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

With these adjustments, things are smooth:

  • c.a.print(&c.a) ==> I am C(42, 7, -1)
  • c.b.print(&c.b) ==> I am C(42, 7, -1)
  • c.print(&c) ==> I am C(42, 7, -1)

Perfect, isn’t it ?

Not yet: what if we add more virtual functions in our structures ? This will increase the object size, and require more pointer initialization in constructors! What a waste of space and time: we are assigning the same pointers each time. Could we prepare the set of pointers for each classes in a static, global structure, and just initialize a pointer to these static data inside the constructors ? Neat, isn’t it ?

Here’s the modified version – let’s call the global structure tables containing specialized function pointers vtbl, as in virtual table, because this is actually what they are: table for virtual functions!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
// type for global virtual function table structure A
// this structure enumerates all virtual functions of A
struct A;
struct table_A {
  void (*print)(struct A *this);
};

struct A {
  const struct table_A *vtbl;
  int id;
};

void A_print(struct A *this) {
  printf("I am A(%d)\n", this->id);
}

// the global, static table data for A
// this structure enumerates all virtual functions of B
static const struct table_A table_A_for_A = { A_print };

void A_A(struct A *this) {
  this->vtbl = &table_A_for_A;
  this->id = 42;
  print_object(__FUNCTION__, this, sizeof(*this));
}

// type for global virtual function table structure B
struct B;
struct table_B {
  void (*print)(struct B *this);
};

struct B {
  const struct table_B *vtbl;
  int age;
};

void B_print(struct B *this) {
  printf("I am B(%d)\n", this->age);
}

// the global, static table data for B
static const struct table_B table_B_for_B = { B_print };

void B_B(struct B *this) {
  this->vtbl = &table_B_for_B;
  this->age = 7;
  print_object(__FUNCTION__, this, sizeof(*this));
}

struct C {
  struct A a;
  struct B b;
  int mode;
};

void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

void CA_print(struct A *this) {
  C_print((struct C*) this);
}

void CB_print(struct B *this) {
  // we know that this points after the struct A layout
  // so let's fix the pointer by ourselves!
  size_t offset = (size_t) &((struct C*) NULL)->b;
  C_print((struct C*) ( ( (char*) this ) - offset ) );
}

// the global, static tables data for A and B structures, for C
static const struct table_A table_A_for_C = { CA_print };
static const struct table_B table_B_for_C = { CB_print };

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // Override virtual function pointers that have just been initialized by
  // A and B constructors with our own version. This allows to override all
  // possible virtual functions in base classes!
  this->a.vtbl = &table_A_for_C;
  this->b.vtbl = &table_B_for_C;
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

Neat, isn’t it ?

Well, I’m telling you a secret: this is more or less what the C++ compiler does (yes, behind the curtains) when implementing virtual functions. The virtual function tables are called virtual tables, and are static structures inside the compiled code. And inside each corresponding class constructor, additional code is inserted just between base class initialization calls, and user code, to set all virtual table pointers corresponding to the class type.

Some additional remarks on this design:

  • if you call a virtual function inside a base class constructor, you will not call the top-level function version, because the virtual table pointer has not yet been initialized (it will be initialized after the end of the base class constructor)

  • you need one static structure containing all necessary stuff for a given class

  • you can put additional information on these virtual table structures, which can be useful!

    • the class name
    • type information functions (what is this object’s real class ?), for example by implementing a hidden getClassType() virtual function
    • dynamic cast support

… and this is exactly what most compiler do, actually: Runtime Type Information data is generally included as soon as a virtual table needs to be created for a given class (ie. you can use dynamic_cast<>())

Now you know what’s going on when you define a virtual function: rather than calling the function directly, an indirection through a virtual table is done.

This has some drawbacks:

  • a little function call impact on performances, because you need to read the actual function pointer in a static table

  • a potentially huge impact on performances because you can not inline anymore function calls

A Deeper Look

Let’s dig a little bit more, using GCC version of the virtual tables.

If you run your program under GDB and examine an instance of class C, you’ll see:

1
2
(gdb) p c
$1 = {<A> = {_vptr.A = 0x400e70 <vtable for C+16>, id = 42}, <B> = {_vptr.B = 0x400e88 <vtable for C+40>, age = 7}, mode = -1}

Note the _vptr.A = 0x400e70 and _vptr.B = 0x400e88 : these are the virtual tables.

What is behind these pointers ? Recent GCC releases allows you to dig a bit into an object’s virtual table:

1
2
3
4
5
6
(gdb) info vtbl c
vtable for 'C' @ 0x400e70 (subobject @ 0x775bfbf98600):
[0]: 0x400b00 <C::print()>

vtable for 'B' @ 0x400e88 (subobject @ 0x775bfbf98610):
[0]: 0x400b34 <non-virtual thunk to C::print()>

Otherwise, you can easily guess that each vtables starts with the same kind of function array we created for our C version:

1
2
3
4
5
(gdb) p ((void**)0x400e70)[0]
$5 = (void *) 0x400b00 <C::print()>

(gdb) p ((void**)0x400e88)[0]
$9 = (void *) 0x400b34 <non-virtual thunk to C::print()>

Easter Egg

Now you understand a bit more the concept or vtable, let’s see a dirty-but-fun hack (valid for GCC, possibly for Visual C++ too): how to override a specific object’s virtual function table, and rewire virtual functions at runtime

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Our patched version of C::printf (or A::printf)
static void our_function(C *thiz) {
  printf("I am now a new C(%d, %d, %d)!\n", thiz->id, thiz->age, thiz->mode);
}

// Our patched version of B::printf
static void our_function_from_B(B *thiz) {
  // This is the same boring arithmetic than before, except that
  // we use a fake C instance located at a random (but non-NULL) address
  // to compute the offset. Note that the actual instance address will
  // never be dereferenced, so do not fear illegal accesses!.
  C *dummy = reinterpret_cast<C*>(0x42);
  B *dummy_B = static_cast<B*>(dummy);
  char *dummy_ptr = reinterpret_cast<char*>(dummy);
  char *dummy_B_ptr = reinterpret_cast<char*>(dummy_B);
  size_t offset = dummy_B_ptr - dummy_ptr;
  char *ptr = reinterpret_cast<char*>(thiz);
  C *thiz_C = reinterpret_cast<C*>(ptr - offset);
  our_function(thiz_C);
}

int main(int argc, char **argv) {
  // create an instance of C
  C c;

  // Hack the c instance and override its print() function!
  void *our_vtbl[1] = { reinterpret_cast<void*>(our_function) };
  void *our_vtbl2[1] = { reinterpret_cast<void*>(our_function_from_B) };
  // patch for A* and C*
  *reinterpret_cast<void**>(static_cast<A*>(&c)) = our_vtbl;
  // patch for B*
  *reinterpret_cast<void**>(static_cast<B*>(&c)) = our_vtbl2;

  // get the pointers (note: GCC will optimize direct calls to c.printf)
  A *pa = &c;
  B *pb = &c;
  C *pc = &c;

  // and call printf() ... what will be printed ?
  pa->print();
  pb->print();
  pc->print();

  return 0;
}

And yes, your object was hacked:

1
2
3
I am now a new C(42, 7, -1)!
I am now a new C(42, 7, -1)!
I am now a new C(42, 7, -1)!

Neat, isn’t it ? (yes, yes, not really portable though)

References

TL;DR: remember that C++ used to be enhanced C a long time ago, and most basic features can be easily implemented and understood through plain C code!

A Story of Realloc (and Laziness)

A Simple Macro

It began with a simple macro: (approximative code)

1
2
3
4
5
6
7
8
9
10
11
12
#define ADD_BYTE(C) do {            \
  if (offset == capa) {             \
    if (capa < 16) {                \
      capa = 16;                    \
    } else {                        \
      capa <<= 1;                   \
    }                               \
    buffer = realloc(buffer, capa); \
    assert(buffer != NULL);         \
  }                                 \
  buffer[offset++] = (C);           \
} while(0)

For the C-illiterates, this simple macro just adds the byte ‘C’ to the dynamically allocated buffer buffer, whose capacity (in bytes) is capa. The next position to be written is identified by offset. The capacity is doubled each time the buffer is filled (starting with a minimum of 16 bytes).

We’re adding bytes in a dynamic buffer – this is one of the most common thing you can do in a given program (used in strings, in arrays, etc.)

The problem is to understand if the reallocation strategy is good enough. If you take a look at the complexity, assuming realloc is O(N), you quickly conclude that adding a single byte is on average O(1), which is good.

But what is the worst-case complexity, when reallocating a buffer ?

Guy: Are you sure this is the good strategy ? It looks like you have a big performance issue if you extend the size of a large array – say, one gigabyte for example. Imagine the impact, especially if the buffer has to be moved from the swap.

Me: Hummm, I never really thought about this, but I think it’s fine. I’m pretty sure the system can handle that gently, actually.

Guy: I think a linked structure might be a better idea – maybe with the same exponential strategy, for example

Me: I’m pretty sure this is a waste of time

Guy: Okay, show me

Sheesh

Okay, now I need to back my arguments. As all programmers, I am lazy. I should say: as all programmers should be, I am lazy. Laziness is an incentive to be smarter and more efficient. You are too lazy to execute repetitive or boring tasks, and you need something smarter, something faster. This is sometimes called laziness. I prefer to call that efficiency.

Having a linked list of blocks to store a buffer would be a bit cumbersome. I didn’t say impossibleImpossible n’est pas français as we say sometimes (well, it did not end well for Napoléon, but nevermind) – but cumbersome. Copying a sub-array or storing it on disk would require some work, we probably would need an index of all arrays base pointers, get the low offset 2-based logarithm to get the initial block, and do some other boring stuff. Sheesh, I’m already tired.

I have the feeling that being lazy is the right thing to do here. Because for small memory blocks, we basically do not care, and for large blocks, the system’s virtual memory will gently take care of the issue.

Let’s have a look.

What is realloc(), By The Way ?

It is a regular function, standardized by POSIX, located in the C library. On linux, you can find it inside the libc.so library, with its siblings malloc and free:

1
2
3
4
nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep -E "T (malloc|realloc|free)$"
000000000007e450 T free
000000000007e030 T malloc
000000000007e4e0 T realloc

For the curious, “T” stands for “text symbol” (the uppercase means that the symbol is public and visible) ; the text segment of a program being the code, the data segment being initialized data (variables), and the bss segment being uninitialized (well, actually initialized to zero) data (variables).

Allocating memory, reallocating memory and freeing memory is done through an allocator (thank you Captain Obvious!). There are plenty of them (one of the most common being the buddy allocator).

We can implement our own trivial allocator using the venerable sbrk call, which basically expand the end of the data segment to have more space:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <sys/types.h>
#include <unistd.h>
#include <string.h>

void *malloc(size_t size) {
  return sbrk(size);
}

void free(void *ptr) {
  /* who cares ? */
}

/* Note: meant to grow only (yes, I am lazy) */
void *realloc(void *ptr, size_t size) {
  void *nptr = malloc(size);
  if (nptr == NULL) {
    return NULL;
  }
  // 'old_size' is supposed to be known :)
  // (for example put in a header before the actual block)
  memcpy(nptr, ptr, old_size);
  free(ptr);
  return nptr;
}

[ Note: as someone noted on hacker news, the old_size address needs to be known here ; for example by using a header before the returned address. And no, we shouldn’t free the original block upon realloc failure :) ]

A real allocator would be a bit more clever, of course, with complex data structures aimed to limit the number of actual memcpy calls.

We need to dig a bit inside the Glibc used on Linux to figure out what’s happening for large blocks.

Digging In The Glibc

Just download a recent glibc release, and have a look at the source tree.

There is an interesting directory out there: the malloc directory. Just pick the malloc.c file and open it in an editor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
  The main properties of the algorithms are:
  * For large (>= 512 bytes) requests, it is a pure best-fit allocator,
    with ties normally decided via FIFO (i.e. least recently used).
  * For small (<= 64 bytes by default) requests, it is a caching
    allocator, that maintains pools of quickly recycled chunks.
  * In between, and for combinations of large and small requests, it does
    the best it can trying to meet both goals at once.
  * For very large requests (>= 128KB by default), it relies on system
    memory mapping facilities, if supported.

  For a longer but slightly out of date high-level description, see
     http://gee.cs.oswego.edu/dl/html/malloc.html
*/

The interesting part (for us) is: For very large requests (>= 128KB by default), it relies on system memory mapping facilities, if supported..

The 128KB by default can actually be controlled by the mallopt function:

  M_MMAP_THRESHOLD
         For allocations greater than or equal to the limit specified
         (in bytes) by M_MMAP_THRESHOLD that can't be satisfied from
         the free list, the memory-allocation functions employ mmap(2)
         instead of increasing the program break using sbrk(2).

Okay, so as I said previously, the system’s virtual memory will handle everything when dealing with large blocks.

Basically, it means that:

  • malloc will be trunked to anonymous mmap
  • free will be trunked to munmap
  • realloc will be trunked to mremap (Darn, not yet in POSIX ? Guys ?)

Okay, so we need to dig a little deeper inside mremap.

A man mremap will tell us that:

  mremap()  uses  the  Linux page table scheme.  mremap() changes the mapping between virtual addresses and memory pages.  This can be
  used to implement a very efficient realloc(3).

Humm, very efficient realloc. How efficient ?

First, mmap, munmap, and mremap do have definitions in the glibc. Well, pretty small entry points, actually:

1
2
3
4
nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep -E "(mmap|munmap|mremap)$"
00000000000e4350 W mmap
00000000000e9080 W mremap
00000000000e4380 W munmap

Note that the default entry points are weak symbols in this case, which basically means that they can be overridden by someone else, during dynamic link ; eg. in this case:

1
2
3
4
ldd /tmp/sample
        linux-vdso.so.1 (0x00007584a8aaa000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007584a86e6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007584a8aac000

… the symbols may be overridden by linux-vdso.so.1, which is a magical library mapped in all programs on Linux to speedup some calls involving system calls.

Anyway, our symbols in the glibc are only trunks to kernel system calls (aka “syscalls”) whether they are in the glibc or inside the vdso ; see the sysdeps/unix/sysv/linux/mmap64.c default implementations, for example:

1
2
3
4
5
6
7
8
9
10
11
void *
__mmap64 (void *addr, size_t len, int prot, int flags, int fd, off64_t offset)
{
  // removed arguments check
  void *result;
  result = (void *)
    INLINE_SYSCALL (mmap2, 6, addr,
                    len, prot, flags, fd,
                    (off_t) (offset >> page_shift));
  return result;
}

Okay, so our initial question is not anymore glibc-related, but involves the Linux kernel.

Digging In The Kernel

Now download a recent kernel version and let’s have a quick look at what mremap involves.

If you take a look at the The Linux Kernel Howto, you see that there is an interesting directory for us:

mm This directory contains all of the memory management code. The architecture specific memory management code lives down in arch/*/mm/, for example arch/i386/mm/fault.c.

Ah, nice, just what we needed to check!

The mm/mremap.c file seems to be of some interest. And inside the file, you’ll discover the syscall entry point of the mremap function. It’s just here:

1
2
3
4
5
6
7
8
9
10
11
/*
 * Expand (or shrink) an existing mapping, potentially moving it at the
 * same time (controlled by the MREMAP_MAYMOVE flag and available VM space)
 *
 * MREMAP_FIXED option added 5-Dec-1999 by Benjamin LaHaise
 * This option implies MREMAP_MAYMOVE.
 */
SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
                unsigned long, new_len, unsigned long, flags,
                unsigned long, new_addr)
{

This is basically where you enter in the kernel when invoking the corresponding syscall in your user code (through the glibc or through the corresponding vdso thunk).

If you read this function’s code, you first see various arguments checks, and trivial cases being handled (especially shrinking a mapped memory block, which is just freeing the pages on the tail).

The kernel then attempt to expand the mapped area by growing it (as would do an allocator in a C library with sbrk), and successfully return if expanding the area was possible.

All these trivial cases are O(1) (even if entering in the kernel has a cost – a cost lower now that interrupts aren’t involved anymore, but this is still costly).

What about the worst case ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
        /*
         * We weren't able to just expand or shrink the area,
         * we need to create a new one and move it..
         */
        ret = -ENOMEM;
        if (flags & MREMAP_MAYMOVE) {
                unsigned long map_flags = 0;
                if (vma->vm_flags & VM_MAYSHARE)
                        map_flags |= MAP_SHARED;

                new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
                                        vma->vm_pgoff +
                                        ((addr - vma->vm_start) >> PAGE_SHIFT),
                                        map_flags);
                if (new_addr & ~PAGE_MASK) {
                        ret = new_addr;
                        goto out;
                }

                map_flags = vma->vm_flags;
                ret = move_vma(vma, addr, old_len, new_len, new_addr);
                if (!(ret & ~PAGE_MASK)) {
                        track_exec_limit(current->mm, addr, addr + old_len, 0UL);
                        track_exec_limit(current->mm, new_addr, new_addr + new_len, map_flags);
                }
        }

At first glance, the kernel is doing what we did in our trivial allocator:

  • allocate a new area
  • copy the previous area to the new one, and free the previous one (ie. move)

But let’s have a closer look.

The move_vma call follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
static unsigned long move_vma(struct vm_area_struct *vma,
                unsigned long old_addr, unsigned long old_len,
                unsigned long new_len, unsigned long new_addr)
{
...
        new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
        new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff,
                           &need_rmap_locks);
        if (!new_vma)
                return -ENOMEM;

        moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
                                     need_rmap_locks);

There are two interesting calls here:

  • the copy_vma call
  • the move_page_tables call

The copy_vma function is inside the sibling mm/mmap.c file, and basically moves the internal kernel structure related to the mmap’ed block, which is a vm_area_struct structure.

You can find its definition in include/linux/mm_types.h ; this small structure contains all information on the region: the start and end address, the mmap’ed file on disk if any, etc.

So moving this one will be cheap, O(1).

What about move_page_tables ?

The interesting part seems to lie within this loop:

1
2
3
4
5
6
...
        for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
...
                move_ptes(vma, old_pmd, old_addr, old_addr + extent,
                          new_vma, new_pmd, new_addr, need_rmap_locks);
...

The code is a bit complicated (and lacks a bit some comments, too), but is mostly basic arithmetics and accounting.

The move_ptes function contains itself the deepest inner loop:

1
2
3
4
5
6
7
8
9
10
11
12
...
        for (; old_addr < old_end; old_pte++, old_addr += PAGE_SIZE,
                                   new_pte++, new_addr += PAGE_SIZE) {
                if (pte_none(*old_pte))
                        continue;
                pte = ptep_get_and_clear(mm, old_addr, old_pte);
                pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);

...
                set_pte_at(mm, new_addr, new_pte, pte);
        }
...

What the hell are we doing in this loop ? We basically are moving the PTE, the Page Table Entries of the mmap’ed region somewhere else. These are basically long integers assigned to each pages.

We’re pretty done here: the kernel never actually touched a single bit of memory related to our allocated block, and just swapped few bits to relocate the entire region elsewhere.

We may argue that we are still O(N) technically, but

  • instead of copying entire pages of 4KB (or 2MB for huge pages), we are swapping integers inside the kernel structure
  • we are touching hot structures in the kernel, neither cold memory, nor bytes swapped on disk

Therefore, we are O(N), but with a huge divisor.

Oh, by the way, did you know that O(N) was in many cases misleading ?

In this case, N can be at most 248 (the maximum virtual space size). Most machines only deal with few gigabytes of memory, at most 64GB for most architectures, that is, 236 bytes. A huge page is 221 bytes, leaving 215 operations at most for a mremap (yes, that’s only 32768 operations).

Well, the worst case scenario is cheap in all cases, and this is what I suspected from the beginning.

Further read: do not hesitate to have a look at Gorman’s book Understanding the Linux Virtual Memory Manager.

TL;DR: do not worry, be lazy, and trust realloc.

Security Is Painful

Where Do You Need Security ?

Well, pretty everywhere. As an open-source programmer maintaining a website, I need:

  1. A GPG key to sign sources for Debian
  2. An appropriate certificates to sign my code on Windows
  3. An appropriate certificate for the https website versions, so that my users can use a secured version, involving multiple sub-domains, with dedicated certificates
  4. An appropriate certificate for the smtp with starttls server, allowing to have a bit more secure channel when sending and receiving emails (oh hay NSA!)
  5. The DNS DNSSEC features, securing a bit my DNS naming system

You’re not obliged to secure everything, of course, but most of the stuff above is pretty standard nowadays, and it’s hard not to fulfill all these requirements.

A GPG key to sign sources on Linux

Debian involves thousands of volunteers across the world collaborating to build a complete operating system, and one key aspect is trust among developers. It means that each developer needs to pass certain validation steps to be accepted, and this includes demonstrating your technical skills and your ability to fit the organization spirit. Another requirement is to identify who is responsible for packaging a program (in my case, httrack), as Debian needs to clearly know who uploaded what.

This seems a bit odd to some people: in a virtual world where anonymity is often considered a requirement, and a right – especially in the open-source community – Debian asks people to identify themselves, and identify the work they do.

But the reason behind is trust and accountability: Debian (and all derivative distributions based on its code base) is used by people who would have a hard time using untrusted binaries. What if a backdoor is injected somewhere, for example ? How can we be sure that uploaded packages haven’t been compromised somehow ? Trust is not an option when privacy and security of users is involved (especially in countries where what you may write on twitter can lead you in jail)

What does it mean for a developer like me ?

  • I had to create a GPG key through gpg --gen-key (well, I already had one)
  • I asked three existing Debian developers to meet them in real life (and gladly paid them a beer! :p) so that they can verify my identity (ie. at least one identity card), and sign my GPG key
  • I need to sign produced sources each time I am doing a release, before uploading them to Debian – uploads won’t be accepted otherwise

Oh, by the way, I’ll need to update my key – Debian will soon require to have more secure keys by discarding the old 1024 “Diffie” keys. This will involve key transition steps, including meeting folks at Debian again to re-sign my key.

Appropriate certificates to sign the Windows installer version

If you do not want your users to have the creepy “The publisher could not be verified, are you sure you want to run this software ?” alert message when attempting to install your program, you need to sign at least the installer.

Microsoft provides a handful utility called signtool.exe to sign executables to be distributed:

1
"C:\Program Files\Microsoft SDKs\Windows\v6.0A\Bin\signtool.exe" sign /f "certificate.pfx" /t http://timestamp.verisign.com/scripts/timstamp.dll myprogram.exe

The only problem is that we need an official certificate to do that. Want to have a free PKI like what we have on Debian ? NOPE guys, you need those expensive little pieces of numbers – basically a simple certificate will cost you a thousand dollars. Per year. Yep. Per year.

Fortunately for me, Certum is nice enough to provide free certificates to open-source developers. Thank you, guys!

We only have to create a signing request certificate:

1
openssl req -new -newkey rsa:2048 -keyout signing.pem -out signing.pem

And copy/paste the certificate request part to the provider.

We then need to convert the produced PEM certificate into a useable PFX one for signtool.exe:

1
openssl pkcs12 -export -out certificate.pfx -inkey signing.pem -in certificate.pem

Appropriate certificates for https

If you have an online store, or are selling something, having a secured version of your website is just an absolute requirement.

But, increasingly, https is becoming the de-facto standard (especially after the PRISM scandal, and especially considering that nothing will change on the government’s side) pretty everywhere. The upcoming HTTP/2.0 protocol may even make https a requirement for all sites.

Having a secured website is unfortunately conditioned to the broken and expensive HTTPS public key infrastructure.

Fortunately enough, the guys at StartSSL also offer free SSL certificates for everyone. These are “low” level certificates, but this is perfectly fine for me (no, I don’t want the golden-platinum certificate). Thank you too, guys!

Things are a bit complicated when actually setting up Apache.

First, each virtual host needs its certificate:

1
SSLCertificateFile /etc/apache2/ssl/certs/my-server.pem

We need to get the StartSSL certificate chain locally:

1
2
3
wget http://www.startssl.com/certs/sub.class1.server.ca.pem
wget http://www.startssl.com/certs/ca.pem -O startssl-ca.pem
wget --no-check-certificate https://www.startssl.com/certs/ca-bundle.pem -O startssl-ca-bundle.pem

Then, spend few hours trying to figure out how to mitigate the various known attacks against SSL. Darn. Sheesh. Please give me some feedback if you thing the setup below can be improved :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Enable SSL
SSLEngine on

# Disable compression (CRIME attack)
# "Available in httpd 2.2.24 and later"
# SSLCompression off

# Enable only secure protocols: TLSv1, but not SSLv2/SSLv3
SSLProtocol all -SSLv2 -SSLv3

# Perfect forward secrecy + BEAST mitigation
# See <https://community.qualys.com/blogs/securitylabs/2013/08/05/configuring-apache-nginx-and-openssl-for-forward-secrecy>
SSLCipherSuite EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH+aRSA+RC4:EECDH:EDH+aRSA:RC4:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:!ADH:!SSLv2:+HIGH:-MEDIUM

# Pay attention to the order ciphers are specified
SSLHonorCipherOrder On
SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown

# Certificate global settings
SSLVerifyClient none
SSLCACertificateFile /etc/apache2/ssl/certs/startssl-ca.pem
SSLCertificateChainFile /etc/apache2/ssl/certs/sub.class1.server.ca.pem

You can check whether your setup is fine, using openssl as commandline tool:

1
openssl s_client -CApath /etc/ssl/cert -showcerts -connect www.example.com:443

And you can also use online tools to control used ciphers, options… if you manage to have 100% to all tests, congratulations!

SMTP with certificates

Yes, you also need to pay a certificate for that. The SSL PKI is definitely broken and mostly a scam. Haha.

Fortunately enough, we may use a free certificate for that, too :)

Note that securing the SMTP transactions is not considered sufficient when exchanging sensitive information – use GPG, guys!

For sendmail, I ended up with something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
define(`CERT_DIR', `/etc/mail/certs')
define(`confCACERT_PATH', `CERT_DIR')
define(`confCACERT', `CERT_DIR/CAcert.pem')
define(`confSERVER_CERT', `CERT_DIR/MYcert.pem')
define(`confSERVER_KEY', `CERT_DIR/MYkey.pem')
define(`confCLIENT_CERT', `CERT_DIR/MYcert.pem')
define(`confCLIENT_KEY', `CERT_DIR/MYkey.pem')
TRUST_AUTH_MECH(`LOGIN PLAIN CRAM-MD5 DIGEST-MD5')

dnl Do not choke on missing/invalid client cert
define(`confTLS_SRV_OPTIONS', `V')

LOCAL_CONFIG
O CipherList=ALL

It took me a while to understand that the missing define('confTLS_SRV_OPTIONS', 'V') line was the culprit for rejecting some mails…

One last check to ensure your server is correctly setup:

1
openssl s_client -CApath /etc/ssl/cert -showcerts -connect smtp.example.com:25 -starttls smtp

Securing the DNS with DNSSEC

At last, securing a DNS server becomes increasingly important, too, especially when governments attempt to hijack it. DNSSEC is not yet mainstream unfortunately, and operating systems can not enforce its use yet (which means that evil dictatorships can still intercept public DNS and corrupt the replies).

DNSSEC seems a bit cumbersome to implement at first glance, but at least the system rely on existing DNS infrastructure, and will not cost you additional fees. You’ll find numerous useful FAQ and blogs explaining how to deploy this thing.

You first need to adjust a bit your own DNS server settings to handle DNSSEC. With BIND 9, this can be done through built-in features:

1
2
3
4
dnssec-enable yes;
dnssec-validation auto;
dnssec-lookaside auto;
managed-keys-directory "/var/cache/bind";

Then, you need a Key Signink Key (aka. “KSK”) for each domain you own. The key, to be renewed every two years or so, is meant to sign other keys. The idea behind is that this key is to be declared to the upstream registrar, through your own domain administration system, and the registrar will sign it, to certify that it is legit. This allows to build a trust chain up to the root DNSSEC authority, whose certificate is known by DNS resolvers. With this key, you’ll be able to create a shorter-lifespan key, the Zone Signing Keys (aka. “ZSK”), aimed to sign the various entries in your DNS zone. This shorter-lifespan key will be put in your DNS zone among other entries, as a DNSKEY entry (the record contains the public key side).

Here comes the KSK:

1
dnssec-keygen -r /dev/urandom -a RSASHA256 -b 4096 -n ZONE -f KSK httrack.com

And, at last, you need to create regularly (every two or three months) a new ZSK:

1
dnssec-keygen -K dnssec/zsk -r /dev/urandom -a RSASHA256 -b 2048 -n ZONE httrack.com

And do not forget to sign your zone each time you need to modify it, using the ZSK key:

1
2
dnssec-signzone -K dnssec/zsk -e +3024000 httrack.com
service bind9 reload

The .signed versions of the DNS zones will have to be used in place of the previous unsigned ones (named.conf), of course.

The whole setup is not totally straightforward, and will probably cause headache to newcomers. Using online tools to verify the DNS signed records or to check the DNSSEC setup will be extremely helpful.

TL;DR: Security is sometimes painful, but this must not stop you. Your users are trusting you!

What Books Really Helped You ?

Helped You ?

There are countless books in computer science (and related domains), but it is sometimes hard to pick the right one, the one which provides an easy access to a domain, but at the same time helps you to embrace it.

Algorithms is a typical case, but not the only one: a good professor will not teach you more advanced algorithms, or more efficient ones. A good professor will make complicated things less complicated. An excellent one will turn them into simple things. A bad one will not only turn things even more complicated, but will also hinder any further reflexion, because he’ll convince you that this is not something for you (ie. you are probably too limited for that).

Books are more or less like professors: there are countless of badly written ones, but at the end very few excellent ones.

The motivation behind reading books can vary:

  • beginners wanting to teach themselves
  • computer science students who want a good complement
  • experienced programmers who want to refresh a little bit their memory
  • experienced programmers who are willing to be interviewed for a new job (Google et al. are famous for asking algorithm-related questions)

Let’s review what I enjoyed reading in the past few years.

My Personal List

Yes, this list is personal, non-exhaustive, with probably some questionable choices. Some people may want to complete it, or will disagree with several ones, but hey, I think it’s a good start anyway.

  • The C programming language by Brian Kernighan and Dennis Ritchie. Probably the most controversial one, but this book was the first C-language-related book I read, at a time Internet was not mainstream, and it was a great pleasure to switch from a gfa-basic/assembly developer to a C developer in only few weeks. I would probably not recommend the book nowadays, but I am really grateful for the (possibly imperfect) first C “professor” I had, and this book is one of the main reasons why I am writing C today.

  • Applied Cryptography by Bruce Schneier was also my first cryptography-related book and even if I’m still (very) far from being an expert in this field, it really helped me a lot understanding many of the concepts behind it. I would really recommend this book to anyone interested by cryptography (and I would say that everybody should be interested by cryptography, and not only programmers)

  • Algorithms by Robert Sedgewick and Kevin Wayne is without hesitation the best algorithm book I have ever read, and I would gladly recommend it to anyone – and by anyone I really mean anyone: beginners, students, confirmed developers… I was first a bit disappointed by the rather basic level of the first chapter (quickly reviewing fundamental concepts, including for loops, etc.), but this is a key point of this book: the progression is uniform, and before you even realize it, more complex things appear. I have been studying algorithms during my college years, and I must admit that some of them totally flew over my head ; including Dijkstra’s algorithm, something I painfully implemented the first time when I was still a student (and I did not really understand the underlying concepts at that time, I must admit – this was bruteforce coding for me). In Algorithms, Dijkstra is not something which appears from the void, but is within a logical difficulty progression – and this book taught me how simple Dijkstra’s algorithm was. Yes, really simple: a direct graph search through a priority queue, with edge relaxation – after reading few chapters, I guarantee that this is a really simple thing do to. More generally, very intimidating algorithms many developers are afraid to check out are explained in such way that not only it makes your fears disappear, but gives you the trust to explore further domains (did you know that red-black-trees were not as crazy as they seemed ?). Ah, and a last note: algorithms in this books are written in Java, but are easily adaptable to other languages (in my case, C and C++)

  • Hacker’s Delight by Henry S. Warren is a delicious book for bit-fiddling fans and more generally hackers (and by hackers I do not mean script kiddies trying to deface the NSA website, but hackers). Have you ever wondered how to count the number of leading ones in an integer without spending your time in a loop ? Or how to divide an integer number without actually using the division operation (slow) but only the multiply one ? This book is not only a collection of good recipes, it also explains the maths behind them. I strongly recommend this book to all advanced programmers, but also to anyone wanting to dive inside the numbers behind our computers.

  • Programming Pearls by Jon Bentley is a really enjoyable book on algorithms – and beyond algorithms – with personal and very insightful war stories, entertaining remarks, and more. I really recommend this little gem, which will teach you how simple and powerful a markov chain can be, for example. Oh, and Jon Bentley is also the author of the excellent long common strings data compression (which can be viewed as a “low frequency” compression algorithm, compared to classical “high frequency” algorithms such as zlib, which basically means that it can be piped to a secondary compression, with a combined ratio)

  • IPv6 Théorie et Pratique by G6 association is a French (yes, sorry guys) book which taught me everything on IPv6 at a time IPv6 was really not mainstream (well, it is still not mainstream today, cough cough cough…). It is very unfortunate O’Reilly France disappeared, but at least this book is still online, and I would recommend it too.

What About Videos ?

Oh yes, some videos can also be extremely helpful.

  • For a very good and non-experts oriented introduction to cryptography, I would strongly recommend the Gambling with Secrets series, and the RSA Encryption one. These small entertaining videos are really great (even for non-developers)

  • An excellent algorithm course by Robert Sedgewick and Kevin Wayne which is a good complement to the Algorithms book. Robert Sedgewick is an excellent professor, and the courses videos are intended for anyone (including beginners). Do not hesitate to enroll to this course, one more time: this is really worth it.

TL;DR: No, really, sometimes reading is the shortest path!