xavier roche's homework

random thoughts and sometimes pieces of code

Redirecting Stdio Is Hard

The Two Nasty Bugs

The nasty bugs

You know what it is like: you never saw them, but some people did, and these people are reliable, so there must be something wrong. But nobody on the engineering team managed to reproduce the issues. And the issues are random, and only happen rarely. Very rarely, such as once in a month for a given customer, or even once in a year for another one. And you know what: these bugs are horrible, because it seems you’ll never catch them. They hide, and silently trigger when you are asleep.

First bug report: Can not run an external crash reporter on Linux

This was the nicest one, actually, and it took only few days to investigate it. Basically, in case of crash, a crash reporter was supposed to be ran, and send useful information (stack traces et al.). Unfortunately the crash reporter appeared not to work correctly. It did not start. For an unknown reason (no error messages)

Fortunately, this bug was reproducible internally.

Second bug report: I see log files in my database on Windows

Yes, log files in my database. Or more precisely log files in my database binary files. I was a bit surprised, too. It seems that, randomly, some log files that were supposed to be sent to the logs were redirected to another random file, including database ones.

Obviously, the database was not happy after that.

That was the report after months of investigation: the first ones were just the usual there was a random crash and I have no clue why – I scratched everything and now it works.

These two issues were actually related, and the root cause was the way file descriptors are handled.

Oh, what are file descriptors, but he way ?

The Three Little Descriptors

All C programmers know the three standard library input/output files: stdin, stdout, stderr. These are the default standard input, output, and error streams. The first one typically receives the console input (keyboard), the later ones allows to display output.

If you look at the stdin manual page, you will see that

A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. (…) At program start-up, three streams shall be predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output).

Basically, these FILE streams are encapsulating a file descriptor, that is, a lower-level input or output resource (which is generally identified by an integer, ie. an int), and provide additional features, such as buffering, and offer a set of advanced functions, such as fprintf to print formatted output.

By default, the input/output streams are connected to the console (Linux, or DOS), but you may redirect them easily – this feature is especially known in the Linux/Un*x world:

1
ls -l /tmp/foo >/tmp/list 2>/tmp/errorlog

In this very basic shell script example, the ls -l /tmp/foo command standard output is redirected to the file /tmp/list, and the standard error output is redirected to /tmp/errorlog. Note that the >/tmp/list can also be written as 1>/tmp/list: the ‘1’ and ‘2’ refer to the file descriptors 1 and 2, which are the standard output and error file descriptors. The standard input is the file descriptor 0:

1
read A 0</tmp/bar

Why are the standard input/output streams identified by 0, 1 and 2 ? Well, this is just a convention – and, by default, file descriptors are allocated increasingly, and the first three files opened (this is done by the system) are these streams.

We can redirect the standard streams easily using shell commands – but can we do that programmatically, especially for the output streams ?

Redirection to a File

We have two output file descriptors (1 and 2), and we want to capture them in a file – say, a log file. This is quite convenient, especially for server applications: log files can then be rotated, inspected…

The standard way to reassign a file descriptor is through the dup2 standard library function. This function takes an opened file descriptor, and clone it to the second file descriptor provided. You can also extract the file descriptor beneath a FILE stream through the fileno standard library function: here’s a simple code to redirect the standard output to a file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>

/** Redirect stdout and stderr. Returns 0 upon success. **/
static int redirect_std_out_err(const char *log_name) {
  int failure = 1;

  /* Open log FILE stream. */
  FILE *const fp = fopen(log_name, "wb");
  if (fp != NULL) {

    /* Extract log FILE fd, and standard stdout/stderr ones. */
    const int log_fd = fileno(fp);
    const int stdout_fd = fileno(stdout);
    const int stderr_fd = fileno(stderr);
    if (log_fd != -1 && stdout_fd != -1 && stderr_fd != -1) {

      /* Clone log fd to standard stdout/stderr ones. */
      if (dup2(log_fd, stdout_fd) != -1 && dup2(log_fd, stderr_fd) != -1) {
        failure = 0;
      }
    }

    /* Close initial file. We do not need it, as we cloned it. */
    fclose(fp);
  }
  return failure;
}

int main(int argc, char **argv) {
  /* Redirect stdout and stderr to a single log file */
  if (argc != 2 || redirect_std_out_err(argv[1])) {
    return EXIT_FAILURE;
  }

  /* Print output to both streams */
  printf("Hey, stdout is redirected!\n");
  fprintf(stderr, "Hey, stderr is also redirected!\n");

  return EXIT_SUCCESS;
}

And that’s it! Simple, isn’t it ?

Here Comes The Problems

Redirection is what we typically do in my company – I even modified some of the involved code. But sending output to a log file has its drawbacks. Especially regarding log file size: the file is growing, and we need to clear it time to time, or the disk will eventually fill. This is generally done through a “rotate” operation, that is:

  • closing the current log file
  • renaming it with some timestamp
  • opening a fresh one
  • cleanup old logs if necessary

File descriptors, more generally, are special resources in most operating systems, as they are inherited when spawning a process. By default, all file descriptors are transmitted to spawned processes (on POSIX systems, fork clone them, and exec functions keep them opened by default), and this is a pain: pipes and sockets are also inherited. This may lead to the following case:

  • main program opens a socket to a remote location
  • spawning a process, which inherits this socket (file descriptor) too
  • main program sens commands to remote location and closes the socket
  • spawned program still lives, and still has the socket file descriptor copy opened
  • remote location does not “sees” the client close() and is stuck

The thing is that spawned processed do not give a damn about already opened file descriptors (except for stdio). But the standard is broken for historical reasons, very unfortunately.

The solution is painful: you need to flag all possible opened file descriptors with the FD_CLOEXEC flag set. This flag explicitly tells the system that the file descriptor shall not be inherited when spawning a process:

  • open(..., ...) => open(..., ... | O_CLOEXEC)
  • fopen(..., ...) => open(..., ... | O_CLOEXEC) + fdopen(...) OR fopen(..., ... + "e")
  • dup(...) => fcntl(..., F_DUPFD_CLOEXEC, (long) 0)
  • dup2(..., ...) => dup3(..., ..., O_CLOEXEC)
  • opendir(...) => open(..., O_RDONLY | O_DIRECTORY | O_CLOEXEC) + fdopendir(...)
  • pipe(...) => pipe2(..., O_CLOEXEC)
  • acept(..., ..., ...) => accept4(..., ..., ..., O_CLOEXEC)
  • socket(..., ..., ...) => socket(..., ... | SOCK_CLOEXEC, ...)
  • and so on…

Some of these functions are non-standard (such as accept4) unfortunately. Some other will not be available (Windows). Some old kernels (Linux) will sometimes not support these features – and some old glibc will not work either (especially for the “e” flag). Yes, this is a real pain.

Oh, one last thing: do not dare to use fcntl(fd, F_SETFD, ...) to clear the FD_CLOEXEC flag: this is insanely hopeless in a multithreaded environment. Can you guess why ?

So What Was That All About ?

Okay, back yo our nasty bugs!

  1. The first one was quite easy to investigate.

Here are the investigation steps, ordered temporally:

  • “can not run an external crash reporter on Linux”; ie. can not fork()/execv() the external crash reporter in a signal handler, with a status code returned by waitpid() reporting a signal 6 caught during launch (ie. the launched program called abort())
  • we can not run the crash reporter outside the signal handler either
  • we can not run the crash reporter using system() either
  • we can not run “/bin/echo foo” through system() either; the command returns the status “return code 1”!
  • … BUT we CAN run “/bin/echo -n” through system()!

At this stage, it is clear that the issue is related to the output file descriptors.

Further investigation showed that:

  • we redirected output to logs, opening the log file with the FD_CLOEXEC flag set, AND using the Linux-specific dup3() call
  • as a result or dup3(), standard streams had their FD_CLOEXEC flag set
  • spawning an external program led to have the standard streams closed
  • attempting to write on the closed standard streams led to either a broken pipe signal, or an assertion failure depending on the program involved

The solution was simply to use the default dup2() function for standard streams. These are the only streams we want to be inherited, after all!

  1. The second bug took a bit more time

The biggest issue is that the bug could not be reproduced at first internally. Until our Q&A team managed to reproduce it (these guys are really good at finding bugs, I must admit) three days in a row.

The logs reports were strange, and totally unrelated. But one log was kind of interesting:

1
2
3
[2014/11/28-10:59:08.514] [info] ...
[2014/11/28-10:59:08.659] [info] ...
[Fatal Error] :2:999961: XML document structures must start and end within the same entity.

The error was a corrupted produced XML document, with another unrelated log inside it, in the middle of the structure. The final error did not have any timestamp, but the previous one was just before midnight (GMT). Humm, a midnight bug ? What are the chances for this being a coincidence ?

I then realized that midnight was the trigger time for a number of maintenance operations, including… log rotation.

But after thoughtfully inspecting the code, everything looked fine. And the bug never popped up on Linux, it was Windows-specific (or at least, much frequent on this operating system)

But there was another clue: all weird logs were actually logs coming from the Java virtual machine (either through Log4j, or through System.out.println() calls)

When using System.out (or System.err) in Java, we are actually using a PrintStream, which is an object wrapped in various FilterOutputStream objects for buffering, the final instance being a FileOutputStream, which is, according to the documentation an:

output stream for writing data to a File or to a FileDescriptor

And, in our case, the FileDescriptor involved was FileDescriptor.out and FileDescriptor.err, that is, Java handles to the standard output and error streams.

Fortunately, the sources of the Open JDK are public, and it is possible to dig into the real code beneath, that is, the native code:

  • ./jdk/src/share/classes/java/io/FileOutputStream.java
  • ./jdk/src/windows/native/java/io/FileOutputStream_md.c
  • ./jdk/src/solaris/native/java/io/FileOutputStream_md.c

Hummm… there is a Windows-specific implementation for FileOutputStream native calls. strange, isn’t it ?

1
2
3
4
JNIEXPORT void JNICALL
Java_java_io_FileOutputStream_write(JNIEnv *env, jobject this, jint byte, jboolean append) {
    writeSingle(env, this, byte, append, fos_fd);
}

And the writeSingle call is using a file descriptor returned by GET_FD():

1
2
3
4
5
6
void
writeSingle(JNIEnv *env, jobject this, jint byte, jboolean append, jfieldID fid) {
    // Discard the 24 high-order bits of byte. See OutputStream#write(int)
    char c = (char) byte;
    jint n;
    FD fd = GET_FD(this, fid);

And this macro just calls GetObjectField(..., IO_handle_fdID) to retrieve … the “handle” member of the corresponding FileDescriptor class.

What ? A handle ? Not a file descriptor ?

Yep: a HANDLE – and this was the root cause of all out problems.

Let me give you the reason: on Windows, a standard library handle is actually a kernel HANDLE wrapped by the C library, nothing more.

It means that if you replace the standard library input/output streams, you’ll just replace the C library streams.

Here is the complete bug scenario on Windows after collecting all the clues:

  • log rotation function is called at low level, and replaces the standard output and error streams 1 and 2 using dup2()
  • the Visual C++ C library replaces the streams, by closing the underlying existing HANDLE, and replacing by new ones
  • the old HANDLE are now totally meaningless (“dangling handles”)
  • Java has no knowledge of the replacement, and still has the old HANDLE stored in its FileDescriptor internals
  • Java System.out calls are now sending data to nowhere (this is bad)
  • someone opens a new file, and Windows recycle the old HANDLE
  • Java System.out calls are now sending data to a valid — yet unrelated — file (this is very very bad)
  • data corruption ensues (and not profit)

The root cause is that Java is unaware that the low-level HANDLE has changed.

The solution is simple: let’s tell him!:

For each System.out and System.err OutputStream instances, we need to:

  • get the “out” member field of the FilterOutputStream class and mark it public (see Field.setAccessible())
  • unwrap all FilterOutputStream wrapping by calling fetching the “out” member field recursively
  • cast the resulting instance to a FileOutputStream
  • fetch the FileDescriptor through FileOutputStream.getFD()
  • use SharedSecrets.getJavaIOFileDescriptorAccess() to set the HANDLE (a long) value to -1 temporarily
  • rotate the logs
  • fetch the new HANDLE (a long) through a native call to _get_osfhandle()
  • set the new HANDLE using the previous steps

This is a bit cumbersome – but mixing native code and Java is always cumbersome, so you have to be used to it :)

Epilogue

The two issues were rather different:

  • one was a mistake created when attempting to fix the insane default handle inheritance on both POSIX and Windows API, causing external programs not to run anymore
  • one was a design bug between the Visual C++ library and the JVM (ie. there is no dup2() for HANDLEs), causing random corruptions on files or sockets

But in both cases, a simple standard stream mishandling was the root cause. But the consequences were nasty, the resolution painful, and the time spent unreasonable (the second bug took months to be investigated)

TL;DR: redirecting Stdio is Hard, and prone to nasty bugs if you are not careful enough!

Comments