xavier roche's homework

random thoughts and sometimes pieces of code

I Found a Bug in Strncat()

Yes, A Bug

This code won’t behave correctly (ie. partial copy) on many glibc versions on Linux:

1
strncat(dest, source, SIZE_MAX);

This is a corner case, but this is a bug.

(hint: the correct behavior is to do strcat(dest, source) equivalent)

A Secure Version Of Strcat

It began with code cleanup.

In httrack, there is a lot of legacy (old, dirty) code laying around. Rewriting everything from scratch would probably be a sane decision, but I just don’t have the time (and the courage) to do that, so I’m cleaning up bits and bits when possible, attempting to improve the code quality when possible.

One of the terrible design decision was to use C strings everywhere, and associated strcpy, strcat etc. primitives. It was an unfortunate decision, because of the lack of flexibility (immutable capacity), and security issues (buffer overflows that might be introduced).

In this hypothetical case, we’re adding a string to a structure foo:

1
2
3
4
5
6
7
8
struct foo {
  char filename[1024];
...
};

void append_filename(struct foo *foo, const char *filename) {
  strcat(foo->filename, filename);
}

This code will overflow if the passed string added to the existing string overflows the filename buffer.

The strlcat version exists on BSD, and is a reasonable solution to mitigate this problem. Unfortunately, some people do not like BSD, and decided that these functions will never be part of the glibc. (insert flame here)

Besides, switching to strlcat and its friends would involve checking every single occurrence of strcat, strcpy. etc., probably introducing new bugs!

I then decided to use a modified version of the dangerous libc primitives, using compile-time type checking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
 * Check whether 'VAR' is of type char[].
 */
#if (defined(__GNUC__) && !defined(__cplusplus))
/* Note: char[] and const char[] are compatible */
#define HTS_IS_CHAR_BUFFER(VAR) ( __builtin_types_compatible_p ( typeof (VAR), char[] ) )
#else
/* Note: a bit lame as char[8] won't be seen. */
#define HTS_IS_CHAR_BUFFER(VAR) ( sizeof(VAR) != sizeof(char*) )
#endif
#define HTS_IS_NOT_CHAR_BUFFER(VAR) ( ! HTS_IS_CHAR_BUFFER(VAR) )

/**
 * Append at most N characters from "B" to "A".
 * If "A" is a char[] variable then the size is assumed to be the capacity of this array.
 */
#define strncatbuff(A, B, N) \
  ( HTS_IS_NOT_CHAR_BUFFER(A) \
  ? strncat(A, B, N) \
  : strncat_safe_(A, sizeof(A), B, \
  HTS_IS_NOT_CHAR_BUFFER(B) ? (size_t) -1 : sizeof(B), N, \
  "overflow while appending '" #B "' to '"#A"'", __FILE__, __LINE__) )

The idea was to sniff at compile-time the arguments passed to the modified strncat version: when an opaque char* is used, there’s nothing you can do. But when passing a regular char[...] array, a secure version can be used. This is not perfect (especially because with non-GCC compilers, char[8] are seen as char*), but at least it allows you to quickly mitigate many potential buffer overflows without touching existing code.

But at one point I did this quite daring thing to create the secure strcat version:

1
#define strcatbuff(A, B) strncatbuff(A, B, (size_t) -1)

The (size_t) -1 value was simply passed as the boundary of strncatbuff – at some point, we would be doing strncat(dest, source, (size_t) -1), and this seemed fine to me.

Soon after, bugs started to appear (strings were copied partially). It took me a bit of time (with the kind help of the bug reporter) to figure out that my daring macro was involved in this bug. This took me a while because the bug only appeared on Linux, on some glibc releases, and quite randomly.

I managed to have a reproducible minimalistic case:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/* strncat_size_t-1_testcase. */

/* gcc -Wall strncat_size_t-1_testcase.c -o strncat_size_t-1_testcase */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <assert.h>

/* strncat_size_t-1_testcase "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor" */
int main(int argc, char **argv) {
  const size_t size = strlen(argv[1]);
  char *const buffer = malloc(size + 1);
  int success;
  buffer[0] = '\0';
  strncat(buffer, argv[1], SIZE_MAX);
  success = strlen(buffer) == size;
  if (!success) {
    fprintf(stderr, "** result: '%s'\n", buffer);
  }
  assert(success);
  return success ? EXIT_SUCCESS : EXIT_FAILURE;
}

If you build this program, and use it:

1
2
gcc -Wall strncat_size_t-1_testcase.c -o strncat_size_t-1_testcase
./strncat_size_t-1_testcase "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor"

… then you’ll get an error. Or maybe you’ll get one. Randomly.

This Is Not A glibc Bug, But A Misuse Of Strncat

The only way to be sure was to carefully read the strncat standard.

The strncat() function shall append not more than n bytes (a null byte and bytes that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. The initial byte of s2 overwrites the null byte at the end of s1. A terminating null byte is always appended to the result. If copying takes place between objects that overlap, the behavior is undefined.

With the help of specialists in comp.unix.programmer, several points must be noticed:

  • strncat is not a secure version of strcat (strlcat is), and the third argument is not the destination capacity
  • using (size_t) -1 as third argument is actually using SIZE_MAX
  • the third argument is an additional limit and can have any value

Therefore, strncat(..., ..., SIZE_MAX) should behave as strcat(..., ...).

Yes, yes, this is a corner case. But this is a legit one.

Where Is The Bug ?

The reason why not all glibc releases were impacted is simple: this is an optimization bug. (insert laughs from experienced developers)

The original strncat source is not used on x86-64 architectures: the optimized SSE2 version is used in place.

The code might have been introduced in 2013 in this commit which was supposed to introduce a “Faster strlen on x64” – but some existing files seems to have been merged at that time.

Unfortunately, the assembly source is uncommented, and I did not have the courage to dig deeper. But this source is buggy.

Funny Remark: By the way, I’m not even convinced that the assembly version is faster at all: code executing speed is probably irrelevant compared to L1/L2/L3/memory bandwidth issues.

What’S Next ?

First I obviously stopped using SIZE_MAX with strncat, and dispatch to strcat accordingly. This was a daring thing to use this corner case value anyway, prone to … corner-case bugs.

The second step was to fill a bug report in the glibc.

TL;DR: Beware when using corner cases, even when perfectly conforming to the standard. And never “optimize” code by rewriting obvious functions into a 300-lines of undocumented assembly!

Coucal, Cuckoo-hashing-based Hashtable With Stash Area C Library

Greater Coucal Centropus sinensis

Coucal ?

According to Wikipedia,

A coucal is one of about 30 species of birds in the cuckoo family. All of them belong in the subfamily Centropodinae and the genus Centropus. Unlike many Old World cuckoos, coucals are not brood parasites.

If you read my previous blog entry on HTTrack’s new hashtables, you probably already know that cuckoo hashing is a rather new (2001) hashing technique to be used for efficient hashtables.

The idea is to have a smarter open-addressing strategy based on kicking an existing entry if a collision occurs. This is why the name Cuckoo hashtable was chosen.

Note that the name is a bit misleading: a typical cuckoo will not only kick an existing foe, but will consequently kill it (nature, you scary!). Here, we only relocate it, reason why I choose the Coucal, a nicer cuckoo member who does not destroy existing entries – I mean, existing eggs!

This implementation has the following features:

  • guaranteed constant time (Θ(1)) lookup
  • guaranteed constant time (Θ(1)) delete or replace
  • average constant time (O(1)) insert
  • one large memory chunk for table (and one for the key pool)
  • simple enumeration

This implementation has been tested since July 2013 in HTTrack, and so far, no bugs were reported (the only issues spotted were bugs in upstream code that was corrupting existing keys)

Please have a look at this library implementation: https://github.com/xroche/coucal

TL;DR: cuckoo hashtables are both efficient and smart!

C Corner Cases (and Funny Things)

Enter The void

void is a very strange animal. Basically void means nothing, emptiness. A function “returning” void actually does not return anything. On the machine code side, register(s) aimed to hold return values are simply unused in such case.

You won’t be surprised to hear that the size of void is undefined for this reason ; ie. sizeof(void) is not supposed to exist, even if some compilers have rather dirty extensions that allow to set this size to 1 to do pointer arithmetics.

But what about void* ? A pointer to something that does not exist, what’s that ?

Well, this is the first trap: void* is not really related to void. The void* type is used to store addresses of unspecified types ; ie. you can cast from/to the void* type, for example when passing generic functions (such as read(), write(), memset()..)

When I say cast, I don’t really mean cast, actually. You can view the pointer to void as a kind of a supertype, and aliasing rules do not apply per se: casting int* to void* is legit and vice-versa. But casting the resulting void* pointer to long*, for example, is not, because aliasing rules would be violated by indirectly casting from int* to long*. Yes, aliasing rules are a whole lot of fun. We’ll see that later.

Functions Are Not Data

Here comes another trap: you can cast anything from/to void*, except pointer to functions. Yep, in C, pointers to data and functions are not necessarily if the same size. Basically this means that the address space dedicated to data (data/bss segments) and the one dedicated to code (text segment) are not necessarily of the same size (you could have a 32-bit code pointers and 64-bit data pointers for example). Of course this is almost always the case, because the universe would otherwise collapse, and standard functions such as dlsym would not work anymore.

To quote the holy standard:

The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function.

Function Pointer Definition

Oh, yes.

1
  int (*function)(char* s, int c) = my_function;

Function definitions are sometimes see as tricky by newcomers. They are NOT. Simply replace (*function) on the left member (the parenthesis are there for operator priority reasons) by a function name, and you just have the classic function declaration.

Is char Nice ?

In comparison to void* and function pointers, char is the nice guy:

  • its size is 1 by definition, which makes the pointer variant suitable for pointer arithmetics
  • it can be also cast from and to any other type without violating aliasing rules (still per se, re-casting to a third incompatible type would still violate aliasing rules)
  • it can be dereferenced after a cast (ie. you can actually reads the bytes within an int* safely)

But char Is Generally Signed

Yes, generally, because this is not specified in the standard. This is actually annoying, because when handling strings, you may end up with negative characters for ASCII > 127, and negative values can be promoted to integers.

Here’s a buggy example:

1
2
3
4
5
6
7
/** Return the next character within a \0-terminated string, or EOF. **/
int my_read_char(const char *buffer, size_t *offs) {
  if (buffer[*offs] != '\0') {
    return buffer[*offs++];  /* here's the trap */
  } else {
    return EOF;
  }

This function will return negative values for ASCII > 127, and especially the value -1 for ASCII 255 (0xFF), which is also the value of EOF.

You need to explicitly cast to a unsigned char to have a working version:

1
2
3
4
5
6
7
/** Return the next character within a \0-terminated string, or EOF. **/
int my_read_char(const char *buffer, size_t *offs) {
  if (buffer[*offs] != '\0') {
    return (unsigned char) buffer[*offs++];
  } else {
    return EOF;
  }

You May Use T* for const T*, But Not T** for const T**

The typical example is that:

  • a char* string may be implicitly cast into const char*:
1
2
  char *foo = strdup("Hello, world!");
  const char *bar = foo;  /* No need for a dirty cast. */

.. but a char** array of string may NOT be implicitly cast into const char**:

1
2
3
  char *foo[2];
  foo[0] = foo[1] = strdup("Hello, world!");
  const char **bar = foo;  /* Warning. */

The compiler will complaint:

1.c:6:22: warning: initialization from incompatible pointer type [enabled by default] const char bar = foo; / Warning. /

The reason behind of the same reason why, in Java, you can not cast List<String> into List<Object>: a function taking the supertype array version may add an incompatible type without being noticed.

Here’s an example:

1
2
3
4
5
6
7
8
static void replace_the_first_element(const char **array) {
  static const char *hello = "hello, world!";
  array[0] = hello;  /* replace first element by a constant string */
}
...
  char *foo[2];
  foo[0] = foo[1] = strdup("Hello, world!");
  replace_the_first_element(foo);

In this example, without a cast warning, the function replace_the_first_element would silently replace the first element of foo by a non-constant string, and the caller would have a constant string within the array when the function returns. The code is strictly equivalent to:

1
2
3
  static const char *hello = "hello, world!";
  char *foo[2];
  foo[0] = hello;  /* now you can see the problem */

I would say that we should be able to silently cast T** into const T*const* (an array of constant arrays of T), but anyway …

An Array Is Not Always An Array

Quizz time: what the test function below is supposed to print ? (don’t cheat!)

1
2
3
4
5
static void test(char foo[42]) {
  char bar[42];
  printf("foo: %zu bytes\n", sizeof(foo));
  printf("bar: %zu bytes\n", sizeof(bar));
}

Well, you might be surprised to see:

1
2
foo: 8 bytes
bar: 42 bytes

Yes, an array of type T in a function argument list is strictly equivalent of declaring the type as T*. This is why nobody uses this confusing syntax within function argument list.

To quote the standard:

“A declaration of a parameter as ‘array of type’ shall be adjusted to ‘qualified pointer to type’, where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation.”

This also imply that the passed object is passed by reference, not value.

… and this is totally inconsistent with the handling of structures as function argument, or as return type. Yes, yes, historical mistakes.

Switch And Case Are Hacks

switch can be seen as a dispatch function, and case as label. Each switch must have its case, but you are free to interleave loops of you want, such as in:

1
2
3
4
5
6
7
switch (step) {
  while (1) {
    case step_A:
    case step_B:
      ...
   }
}

See my previous An Unusual Case of Case (/switch) entry for more fun!

Pointer To Arrays

Let’s say I have a char foo[42] and I want a pointer to this array in bar. I can write:

  • bar = foo
  • bar = &foo
  • bar = &foo[0]

What shall I use ? The first and third pointers are identical: they are char* ones, the one you generally want to use: the pointer points to a char area of undefined size. The second one is a pointer to the array of 42 chars itself. You may then use:

  • char *bar = foo;
  • char (*bar)[42] = &foo;
  • char *bar = &foo[0];

The pointer to the array of 42 chars syntax is a bit weird for newcomers, but it is logical: *bar is an array of 42 chars, so you just have to replace foo by (*bar) in the standard definition to have the correct syntax (the parenthesis are needed for operators priority reasons, otherwise you’ll declare an array of 42 pointer to char). (This trick is also helpful to understand the apparent obscure pointer to function syntax.)

The advantage of the second definition is that you still have a pointer to an array of 42 char: sizeof(*bar) is 42, *bar is of type char[] (suitable for specific compiler optimizations and safe string operations)

Empty Structure

1
struct { } foo;

this code is undefined in C: an empty structure is not officially supported. Some compilers such as GCC support them, and define the size of the structure as 0.

It is supported in C++ (because many classes do not have any member within) and in such case the sizeof of such empty structure is … 1, because C++ needs a different pointer for every instance of a given class. Oh, and because you need to do pointer arithmetics also, and having a null size would cause issues.

Strings Are char Arrays, And Are Not Const (But They Really Are)

Another corner case: string literals, such as "hello world" are actually arrays of char, with a terminating \0. The two definitions are equivalent:

1
char *foo = "hi";
1
2
static const char hi[] = { 'h', 'i', '\0' };
char *foo = (char*) hi;

Note that sizeof("hi") is equal to 3 (an array of 3 chars).

Oh, did you notice the (char*) in the second example ? Isn’t it a bit dirty ?

  • yes, it is
  • in C, string literals are unfortunately of type “array of char” and not “array of const char”
  • in reality, the type is really “array of const char”: if you dare to write inside a string literal, chances are you’ll segfault, because the string literal is inside a global read-only data segment

Parenthesis Have Sometimes Different Meanings

Let’s have a look at two code samples:

1
2
int i = 0, j = 0;
foo(i++, j+=i, j++);
1
2
int i = 0, j = 0;
bar((i++, j+=i, j++));

The first sample calls a function named foo taking three integers as arguments. Is has an undefined behavior: side-effects of i and j will be executed at an undefined time, and might even be optimized. You basically don’t have a clue of what foo will receive as arguments.

The second sample calls a function named bar taking one integer argument. It has a perfectly defined behavior. This integer is the result of the comma operator i++, j+=i, j++, which can be rewritten as the pseudo-C++-code:

1
2
3
4
5
6
7
8
9
static int my_comma(int &i, int &j) {
  i++;
  j+=i;
  return j++;
}
...

int i = 0, j = 0;
bar(my_comma(i, j));

The comma operator evaluates each expression starting from the leftmost member to the rightmost member, and yields the last member as value. All other values are discarded (each expression should yield void, actually). And between each member evaluation, a sequence point is committed: side-effects of each expression are committed for the next expression. This is why this expression has a perfectly defined behavior.

Logical And/Or Can Replace If

Two pretty nice feature of the “boolean and” (&&) and the “boolean or” (||) operators is that they:

  • commit side-effects between each operation: if (foo[i++] != '\0' && bar[i++] != '\0') is perfectly defined, as the post-increment of i will be committed before the evaluation of the right expression (ie. a sequence point exists between left and right operator evaluation)
  • these operators are short-circuit operators, which means that there is a guarantee that the right member is only evaluated if the result of the left member evaluation could not provide the result of the expression ; which allows to write something like if (foo != NULL && *foo != '\0') without fearing of a NULL pointer dereferencing

Beware of char buffer[1024] = ""

Not only is it a bit dirty to have large buffers on the stack (I must admit that some legacy code in httrack is filled with that, cough cough), but the C standard ensure that when initializing a structure or an array, missing elements shall be initialized to zero.

This basically means here that the first element of the array is explicitly initialized to zero (this is the terminating \0 of the empty string), and the 1023 other elements will be implicitly initialized to zero, too. The performance impact is the same as memset(buffer, 0, sizeof(buffer)).

Use at least an explicit buffer[0] = '\0' without any initializer when declaring the buffer.

The Bogus strchr (and Friends) Definitions

Did you notice that ?

1
2
3
char *strchr(const char *s, int c);
char *strstr(const char *haystack, const char *needle);
...

All these function takes a const char* and return a char*. How can a function transform an array of constant chars to a non-constant one ?

Well, obviously this is impossible, and the only reason of this oddity was to offer a somewhat generic function that would work for both char* and const char* strings: accepting const char* as argument will also accept char* versions by silently casting it, and returning char* will also allow to store the result in a const char* for the same reason.

Unfortunately, this may lead to:

1
2
3
4
5
char *pos = strchr(my_const_string, ' ');
if (pos != NULL) {
  *pos++ = '\0';  /* oops, I forgot that the string was constant, and the compiler did not warn me! */
  /* kaboom here */
}

A sane decision would be to split all these string handling functions into const and non-const ones. In C++, you can have specialized versions.

Never Divide By Zero .. Or Divide INT_MIN by -1 After Midnight

Dividing an integer by zero leads to a floating point exception (Yes, this is not a floating point operation, but the error triggers one.).

But dividing the smallest signed integer by -1 also leads to the same floating point exception. This is a bit of a curiosity: when dividing by -1 the smallest integer (-2147483648), the result should be 2147483648, but is too large (by one) to be represented in an integer (the largest integer is 2147483647), and the divide operation then triggers the same exception that the one triggered by the division by zero.

However, multiplying the smallest signed integer by -1 does not trigger anything because… well, because the multiply operation does not raises anything, even when overflowing.

Therefore, the first printf below will gently print a result, but the second one won’t, and the program will abort:

1
2
3
4
5
6
7
8
9
/* Note: using volatile to prevent the div to be optimized out */
volatile int int_min = INT_MIN;
volatile int minus_1 = -1;
int main(int argc, char **argv) {
  printf("%d\n", int_min * minus_1);
  printf("%d\n", int_min / minus_1);

  return EXIT_SUCCESS;
}

By the way, what’s the first result ? Well, -2147483648, because the 32-bit multiply operation of two 32-bit numbers provides a 64-bit result, which is 2147483648 (0x0000000080000000 in hexadecimal), and the lowest part is just truncated to the 32-bit 0x80000000 value, which is… -2147483648 in two’s complement arithmetics. Basically the overflow was silently ignored. Got it ?

Macros Are Not Functions

Macros are often considered evil (and will even cause C++-fans to scream and have foam on their mouth), especially when hiding multiple argument evaluation:

1
#define min(a, b) ( (a) < (b) ? (a) : (b) )

This function will not behave well when used with something like min(++i, ++j): a second increment will be committed in either i or j…

Another difference between a function version and its macro version are the side effects, which are committed before and after calling a function, but not a macro.

Preprocessor Magic

You know that __FILE__ is a magic preprocessor variable holding the current source filename, and __LINE__ the line number at the time of the macro evaluation. This is quite convenient to print debugging:

1
2
3
4
5
6
#define PRINT_CURRENT_FILE_AND_LINE printf("at %s:%d\n", __FILE__, __LINE__)
...
int main(...) {
...
  PRINT_CURRENT_FILE_AND_LINE;  /* print some debug */
...

The __FILE__ and __LINE__ macros will be expanded to the location in the main() function, at expansion time.

You can use the expansion time behavior to convert the __LINE__ numerical token to a string one, for example using a double macro expansion:

1
2
3
4
5
6
7
8
#define L_(TOKEN) #TOKEN
#define L(TOKEN) L_(TOKEN)
#define FILE_AND_LINE __FILE__ ":" L(__LINE__)

int main(int argc, char **argv) {
  printf("we are at: %s\n", FILE_AND_LINE);
  return 0;
}

A call to FILE_AND_LINE will form a string using the three substrings (in C, “foo” “bar” is equivalent to “foobar”). We need an intermediate L_() macro because otherwise __LINE__ would not be expanded, and the resulting string would simply be … "__LINE__".

Aliasing Rules

My headache is killing me, so I’m going to redirect you to the excellent article on cellperformance. But aliasing rules are very, very, very fun, trust me!

More ?

Do not hesitate to comment this article to complete it! There are other C-corner cases I probably didn’t mention…

TL;DR: corner cases are not reserved to C++!

A Basic Glance at the Virtual Table

Virtual Functions, vtable And Other Monsters

C++ inheritance, virtual functions, and virtual tables are sometimes sees as obscure beasts, something most people will not mess with.

Casting an object to another one or calling a virtual function are probably two of the most common operations you can think of in C++, so it is a shame that such simple operations are not fully understood by everyone.

I have been programming in C for many years, and one of the benefits of C is that there is no real magic: when you define a structure, you can see the layout in memory, you can feel the pointers being padded, and every other operations, such as casting a pointer to another one (something you should only do with great care, by the way) are just re-interpreting memory addresses from one type to another one or shifting addresses a bit. No magic. No surprises. Accessing the second member of a pointer array ? This is just basic pointer arithmetic (indirect move with shifting and index, something that even the venerable 68000 processor had if my memory is correct)

Some people go straight in C++ (or even in Java) when they start learning programming – which is in my point of view a bad thing – and tend to lose their focus on what’s really happening behind the curtains (ie. inside the memory)

But things are generally much simpler that you thought.

Back To Basis

Let’s first have a look at a very simple object. A C++ one:

1
2
3
4
5
6
7
8
9
class A {
private:
  int id;

public:
  A(): id(42) {
    printf("yay, my address is %p, and my id is %d!\n", this, this->id);
  }
};

This class contains only a constructor, and has one member (an integer). These two things are not of the same nature: variable members and functions are of different kind (yes, this is a really smart and insightful remark, I know).

To make things clearer, we can view this class as a C-structure, and its functions as global functions taking a (hidden) “this” member as first argument (the pointer to the structure itself). The instance members can be accessed implicitly, without the this-> pointer indirection, by the way: C++ will add the missing bits.

The code above can be rewritten in pure C as follows:

1
2
3
4
5
6
7
8
9
10
// this is the actual "class A" instance data layout!
struct A {
  int id;
};

// this is the class A equivalent constructor (a global function)
void A_A(struct A* this) {
  this->id = 42;
  printf("yay, my address is %p, and my id is %d!\n", this, this->id);
}

… and, rather that just declaring:

1
  A a;

… you’ll need to call the constructor manually:

1
2
  struct A a;
  A_A(&a);  // call constructor manually!

Here it is! C++ in plain C! Of course a C++ compiler will hide everything behind the curtains when using regular classes, but the actual compiled code will be more or less the same.

If you add more members and/or more methods in this class, the principle is the same: extend the structure layout, and/or add more global functions that take an hidden this first argument.

Quizz: what is the size of a class A object instance ? The answer is pretty simple: the class only contains one member (id), which is an integer (32-bit here) ; the sizeof(A) is therefore 4 (bytes). No magic.

Quizz: what will be the size of a class A object instance if a regular member function is added ? The answer is pretty simple, too: its size won’t change, as it still contains only one member – only an additional global function will be added in the code.

What About Inheritance ? This Seems a Bit More Complicated!

No, no, not at all.

Let’s take a simple (yet representative) example: our previous A class, another B class with a single “age” member, and at last a final class C with a member “mode”, but inheriting from A and B.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class A {
public:
  int id;

  A(): id(42) {
    printf("A: yay, my address is %p, my size is %zu, and my id is %d!\n",
           this, sizeof(*this), this->id);
  }
};

class B {
public:
  int age;

  B(): age(7) {
    printf("B: yay, my address is %p, my size is %zu, and my age is %d!\n",
           this, sizeof(*this), this->age);
  }
};

class C: public A, public B {
public:
  int mode;

  C(): mode(-1) {
    printf("C: yay, my address is %p, my size is %zu, my id, age and mode are %d, %d, %d!\n",
           this, sizeof(*this), this->id, this->age, this->mode);
  }
};

Instantiating an object from C (C c;) will print something like:

1
2
3
A: yay, my address is 0x73620f9f5c20, my size is 4, and my id is 42!
B: yay, my address is 0x73620f9f5c24, my size is 4, and my age is 7!
C: yay, my address is 0x73620f9f5c20, my size is 12, my id, age and mode are 42, 7, -1!

The class C constructor called first the upper inherited classes constructors (class A constructor and class B constructor), and therefore prints their information first. Here again, C++ hides everything behind curtains: the calls to base classes constructors are generated automatically, before your actual class C constructor code.

You can note that the address seen by the constructors of A and C are identical, but not the one seen by B. How can it be ?

Let’s rewrite everything in plain C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
struct A {
  int id;
};

// class A constructor
void A_A(struct A* this) {
  this->id = 42;
  printf("A: yay, my address is %p, my size is %zu, and my id is %d!\n",
         this, sizeof(*this), this->id);
}

struct B {
  int age;
};

// class B constructor
void B_B(struct B *this) {
  this->age = 7;
  printf("B: yay, my address is %p, my size is %zu, and my age is %d!\n",
         this, sizeof(*this), this->age);
}

struct C {
  struct A a;
  struct B b;
  int mode;
};

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, user code
  this->mode = -1;
  printf("C: yay, my address is %p, my size is %zu, my id, age and mode are %d, %d, %d!\n",
         this, sizeof(*this), this->a.id, this->b.age, this->mode);
}

The struct C layout is pretty straightforward: the first element is the struct A members – the reason why constructors from class A and class C did see the same address, the second member is the member(s) of class B, and finally comes the member(s) of C.

In both C++ and C variants,

  • the size of C is 12 bytes (the three integers)
  • the location of A within C is at the beginning (the first four bytes)
  • the location of B within C is after the area dedicated to A (the next four bytes)
  • the location of members specific to C then follows

Casting will provide the same behavior, by the way:

  • static_cast<A*>(&c)) == &(&c)->a == 0x73620f9f5c20
  • static_cast<B*>(&c)) == &(&c)->b == 0x73620f9f5c24
  • static_cast<C*>(&c)) == &c == 0x73620f9f5c20

Okay, But Virtual Functions ? Can You Do That In Plain C ?

Well, let’s consider a simple example: simply add a virtual function print to our previous sample, aiming to print the object class name and parameters. The idea is that even if we cast the high-level class C instance into A* or B*, we’ll see the same behavior.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class A {
public:
  int id;

  A(): id(42) {
    printf("A: yay, my address is %p, my size is %zu, and my id is %d!\n",
           this, sizeof(*this), this->id);
  }

  virtual void print() {
    printf("I am A(%d)\n", id);
  }
};

class B {
public:
  int age;

  B(): age(7) {
    printf("B: yay, my address is %p, my size is %zu, and my age is %d!\n",
           this, sizeof(*this), this->age);
  }

  virtual void print() {
    printf("I am B(%d)\n", age);
  }
};

class C: public A, public B {
public:
  int mode;

  C(): mode(-1) {
    printf("C: yay, my address is %p, my size is %zu, my id, age and mode are %d, %d, %d!\n",
           this, sizeof(*this), this->id, this->age, this->mode);
  }

  virtual void print() {
    printf("I am C(%d, %d, %d)\n", id, age, mode);
  }
};

If we create an instance of C called c (C c;), the following calls will produce exactly the same output:

  • c.print();
  • static_cast<A*>(&c)–>print();
  • static_cast<B*>(&c)–>print();
  • static_cast<C*>(&c)–>print();

Where is the magic ? Let’s first have a look at the output, when creating this object:

1
2
3
A: yay, my address is 0x726fa25b7c00, my size is 16, and my id is 42!
B: yay, my address is 0x726fa25b7c10, my size is 16, and my age is 7!
C: yay, my address is 0x726fa25b7c00, my size is 32, my id, age and mode are 42, 7, -1!

We can see that A and B are not anymore 4 bytes, but 16. And C is 32 bytes! Somehow, the C++ compiler did add some bytes in our objects without telling us!

Let’s have a look at each object instance actual contents within each classes, by dumping data as pointers in constructors, to figure ou what has been added inside our beloved structures classes:

1
2
3
4
5
6
7
8
9
// simple dump function
static void print_object(const char *name, void *this_, size_t size) {
  void **ugly = reinterpret_cast<void**>(this_);
  size_t i;
  printf("created %s at address %p of size %zu\n", name, this_, size);
  for(i = 0 ; i < size / sizeof(void*) ; i++) {
    printf("  pointer[%zu] == %p\n", i, ugly[i]);
  }
}

And using print_object(__FUNCTION__, this, sizeof(*this)); inside each constructor.

Remember that the constructors are called as follows when creating an instance of C:

  • base class A constructor is called magically (this points to the beginning of the allocated C structure)
  • base class B constructor is called magically (this starts at 16 bytes from the beginning of the allocated C structure)
  • top level C constructor user code is executed (this points to the beginning of the allocated C structure)

Here’s the result:

1
2
3
4
5
6
7
8
9
10
11
created A at address 0x7082f962ccb0 of size 16
  pointer[0] == 0x400f20
  pointer[1] == 0x2a
created B at address 0x7082f962ccc0 of size 16
  pointer[0] == 0x400f00
  pointer[1] == 0x7
created C at address 0x7082f962ccb0 of size 32
  pointer[0] == 0x400ed0
  pointer[1] == 0x2a
  pointer[2] == 0x400ee8
  pointer[3] == 0xffffffff00000007

We can see that:

  • A contains an unknown additional pointer (0x400f20), and the id variable (0x2a == 42) padded to 8 bytes
  • B contains another unknown additional pointer (0x400f00), and the age variable (7) padded to 8 bytes
  • C embeds A and B, and the mode variable (-1): these are seen as 0x2a, and 0xffffffff00000007 (00000007 for the age variable, and ffffffff for the mode variable: -1 is 0xffffffff in two complement’s world) – and the two additional pointers

So it seems that the structure size change reason is that an hidden pointer member (8 bytes in the 64-bit world) has been added to the class instance – and the compiler probably padded the structure to 16 bytes, because you want to have 8-byte alignment when dealing with arrays: if you have an array of two objects, the second object hidden pointer has to be padded to 8 bytes, enforcing the structure to be a multiple of 8 bytes. The 4 bytes used for padding are unused, but useful to maintain the alignment.

Hey, wait! The two pointers are different, when printed inside the C constructor! Yes, good catch: the additional A pointer which was 0x400f20 inside the class A constructor has been changed into 0x400ed0 inside the class C constructor, and the additional B pointer which was 0x400f00 inside the class B constructor has been changed into 0x400ee8 inside the class C constructor.

Oh, and a last note: C is only 32 bytes.

  • there is no additional hidden pointer for C!
  • the compiler apparently used the padding area of class B for the mode variable (smart, isn’t it ?)

Okay, so two magical pointers have been added because we defined a virtual function. What does it mean ?

Let’s go back to basic: how would you implement this class hierarchy in pure C ?

Here’s what you may do:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
struct A {
  int id;
};

// class A constructor
void A_A(struct A *this) {
  this->id = 42;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void A_print(struct A *this) {
  printf("I am A(%d)\n", this->id);
}

struct B {
  int age;
};

// class B constructor
void B_B(struct B *this) {
  this->age = 7;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void B_print(struct B *this) {
  printf("I am B(%d)\n", this->age);
}

struct C {
  struct A a;
  struct B b;
  int mode;
};

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, user code
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

Okay, looks more or less what we had previously. But what about the virtual function ? We have to be sure that we call the right function, even if we only have a A* or a B* pointer (and we don’t know that this is actually a C object class behind).

A solution can be to add a function pointer for every virtual function in each classes, and initialize inside the constructor!

Something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
struct A {
  void (*print)(struct A *this);
  int id;
};

// class A constructor
void A_A(struct A *this) {
  this->print = A_print;
  this->id = 42;
  print_object(__FUNCTION__, this, sizeof(*this));
}

struct B {
  void (*print)(struct B *this);
  int age;
};

// class B constructor
void B_B(struct B *this) {
  this->print = B_print;
  this->age = 7;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void B_print(struct B *this) {
  printf("I am B(%d)\n", this->age);
}

struct C {
  struct A a;
  struct B b;
  void (*print)(struct C *this);
  int mode;
};

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, user code
  this->print = C_print;
  this->a.print = C_print;  // we patch our own A->print function !!
  this->b.print = C_print;  // and we need to do it for B->print too!!
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

What did we do here ?

  • A and B constructors initialize their versions of print
  • C constructor initialize its own version of print after A and B constructors, and overrides the one inside A and B

Neat, isn’t it ?

Now, whatever pointer you have, these three lines will produce identical behavior:

  • a->print(a);
  • b->print(b);
  • c->print(c);

Hey, by the way, we do not need an additional pointer for C, because we already have the correct version in two locations: a.print and b.print. Let’s remove this useless pointer in C (and the useless initialization), and use:

  • a->print(a);
  • b->print(b);
  • c->a.print(c);

There is a little problem, however, when using a C instance cast in A* or B*. When you call a->print(a), the a pointer is of type A*. And therefore the assignment this->a.print = C_print will produce a warning. Similarly, when you call b->print(b), the this pointer will have a type of B*, and not only it is of the wrong type, but it has a different address, as c->b is located sixteen bytes after the beginning of C.

We need specialized versions with correct casts!

  • casting from A* to C* is straightforward: we know that the address is the same, and the type can be safely cast
  • casting from B* to C* requires to adjust the address to find the beginning of C, and a cast (B is in the middle of C, remember)

Here we go:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

void CA_print(struct A *this) {
  C_print((struct C*) this);
}

void CB_print(struct B *this) {
  // we know that this points after the struct A layout
  // so let's fix the pointer by ourselves!
  size_t offset = (size_t) &((struct C*) NULL)->b;
  C_print((struct C*) ( ( (char*) this ) - offset ) );
}

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // then, override functions
  this->a.print = CA_print;  // we patch our own function !!
  this->b.print = CB_print;  // and we need to do it for B too!!
  // this->print = C_print;  // useless
  // then, user code
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

With these adjustments, things are smooth:

  • c.a.print(&c.a) ==> I am C(42, 7, -1)
  • c.b.print(&c.b) ==> I am C(42, 7, -1)
  • c.print(&c) ==> I am C(42, 7, -1)

Perfect, isn’t it ?

Not yet: what if we add more virtual functions in our structures ? This will increase the object size, and require more pointer initialization in constructors! What a waste of space and time: we are assigning the same pointers each time. Could we prepare the set of pointers for each classes in a static, global structure, and just initialize a pointer to these static data inside the constructors ? Neat, isn’t it ?

Here’s the modified version – let’s call the global structure tables containing specialized function pointers vtbl, as in virtual table, because this is actually what they are: table for virtual functions!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
// type for global virtual function table structure A
// this structure enumerates all virtual functions of A
struct A;
struct table_A {
  void (*print)(struct A *this);
};

struct A {
  const struct table_A *vtbl;
  int id;
};

void A_print(struct A *this) {
  printf("I am A(%d)\n", this->id);
}

// the global, static table data for A
// this structure enumerates all virtual functions of B
static const struct table_A table_A_for_A = { A_print };

void A_A(struct A *this) {
  this->vtbl = &table_A_for_A;
  this->id = 42;
  print_object(__FUNCTION__, this, sizeof(*this));
}

// type for global virtual function table structure B
struct B;
struct table_B {
  void (*print)(struct B *this);
};

struct B {
  const struct table_B *vtbl;
  int age;
};

void B_print(struct B *this) {
  printf("I am B(%d)\n", this->age);
}

// the global, static table data for B
static const struct table_B table_B_for_B = { B_print };

void B_B(struct B *this) {
  this->vtbl = &table_B_for_B;
  this->age = 7;
  print_object(__FUNCTION__, this, sizeof(*this));
}

struct C {
  struct A a;
  struct B b;
  int mode;
};

void C_print(struct C *this) {
  printf("I am C(%d, %d, %d)\n", this->a.id, this->b.age, this->mode);
}

void CA_print(struct A *this) {
  C_print((struct C*) this);
}

void CB_print(struct B *this) {
  // we know that this points after the struct A layout
  // so let's fix the pointer by ourselves!
  size_t offset = (size_t) &((struct C*) NULL)->b;
  C_print((struct C*) ( ( (char*) this ) - offset ) );
}

// the global, static tables data for A and B structures, for C
static const struct table_A table_A_for_C = { CA_print };
static const struct table_B table_B_for_C = { CB_print };

void C_C(struct C *this) {
  // manually call base classes constructors first
  A_A(&this->a);
  B_B(&this->b);
  // Override virtual function pointers that have just been initialized by
  // A and B constructors with our own version. This allows to override all
  // possible virtual functions in base classes!
  this->a.vtbl = &table_A_for_C;
  this->b.vtbl = &table_B_for_C;
  this->mode = -1;
  print_object(__FUNCTION__, this, sizeof(*this));
}

Neat, isn’t it ?

Well, I’m telling you a secret: this is more or less what the C++ compiler does (yes, behind the curtains) when implementing virtual functions. The virtual function tables are called virtual tables, and are static structures inside the compiled code. And inside each corresponding class constructor, additional code is inserted just between base class initialization calls, and user code, to set all virtual table pointers corresponding to the class type.

Some additional remarks on this design:

  • if you call a virtual function inside a base class constructor, you will not call the top-level function version, because the virtual table pointer has not yet been initialized (it will be initialized after the end of the base class constructor)

  • you need one static structure containing all necessary stuff for a given class

  • you can put additional information on these virtual table structures, which can be useful!

    • the class name
    • type information functions (what is this object’s real class ?), for example by implementing a hidden getClassType() virtual function
    • dynamic cast support

… and this is exactly what most compiler do, actually: Runtime Type Information data is generally included as soon as a virtual table needs to be created for a given class (ie. you can use dynamic_cast<>())

Now you know what’s going on when you define a virtual function: rather than calling the function directly, an indirection through a virtual table is done.

This has some drawbacks:

  • a little function call impact on performances, because you need to read the actual function pointer in a static table

  • a potentially huge impact on performances because you can not inline anymore function calls

A Deeper Look

Let’s dig a little bit more, using GCC version of the virtual tables.

If you run your program under GDB and examine an instance of class C, you’ll see:

1
2
(gdb) p c
$1 = {<A> = {_vptr.A = 0x400e70 <vtable for C+16>, id = 42}, <B> = {_vptr.B = 0x400e88 <vtable for C+40>, age = 7}, mode = -1}

Note the _vptr.A = 0x400e70 and _vptr.B = 0x400e88 : these are the virtual tables.

What is behind these pointers ? Recent GCC releases allows you to dig a bit into an object’s virtual table:

1
2
3
4
5
6
(gdb) info vtbl c
vtable for 'C' @ 0x400e70 (subobject @ 0x775bfbf98600):
[0]: 0x400b00 <C::print()>

vtable for 'B' @ 0x400e88 (subobject @ 0x775bfbf98610):
[0]: 0x400b34 <non-virtual thunk to C::print()>

Otherwise, you can easily guess that each vtables starts with the same kind of function array we created for our C version:

1
2
3
4
5
(gdb) p ((void**)0x400e70)[0]
$5 = (void *) 0x400b00 <C::print()>

(gdb) p ((void**)0x400e88)[0]
$9 = (void *) 0x400b34 <non-virtual thunk to C::print()>

Easter Egg

Now you understand a bit more the concept or vtable, let’s see a dirty-but-fun hack (valid for GCC, possibly for Visual C++ too): how to override a specific object’s virtual function table, and rewire virtual functions at runtime

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Our patched version of C::printf (or A::printf)
static void our_function(C *thiz) {
  printf("I am now a new C(%d, %d, %d)!\n", thiz->id, thiz->age, thiz->mode);
}

// Our patched version of B::printf
static void our_function_from_B(B *thiz) {
  // This is the same boring arithmetic than before, except that
  // we use a fake C instance located at a random (but non-NULL) address
  // to compute the offset. Note that the actual instance address will
  // never be dereferenced, so do not fear illegal accesses!.
  C *dummy = reinterpret_cast<C*>(0x42);
  B *dummy_B = static_cast<B*>(dummy);
  char *dummy_ptr = reinterpret_cast<char*>(dummy);
  char *dummy_B_ptr = reinterpret_cast<char*>(dummy_B);
  size_t offset = dummy_B_ptr - dummy_ptr;
  char *ptr = reinterpret_cast<char*>(thiz);
  C *thiz_C = reinterpret_cast<C*>(ptr - offset);
  our_function(thiz_C);
}

int main(int argc, char **argv) {
  // create an instance of C
  C c;

  // Hack the c instance and override its print() function!
  void *our_vtbl[1] = { reinterpret_cast<void*>(our_function) };
  void *our_vtbl2[1] = { reinterpret_cast<void*>(our_function_from_B) };
  // patch for A* and C*
  *reinterpret_cast<void**>(static_cast<A*>(&c)) = our_vtbl;
  // patch for B*
  *reinterpret_cast<void**>(static_cast<B*>(&c)) = our_vtbl2;

  // get the pointers (note: GCC will optimize direct calls to c.printf)
  A *pa = &c;
  B *pb = &c;
  C *pc = &c;

  // and call printf() ... what will be printed ?
  pa->print();
  pb->print();
  pc->print();

  return 0;
}

And yes, your object was hacked:

1
2
3
I am now a new C(42, 7, -1)!
I am now a new C(42, 7, -1)!
I am now a new C(42, 7, -1)!

Neat, isn’t it ? (yes, yes, not really portable though)

References

TL;DR: remember that C++ used to be enhanced C a long time ago, and most basic features can be easily implemented and understood through plain C code!