This is the second post of a series that I am making on C++ exceptions.

The following assumes you have read the first post already.

Now we are warmed up with some assembly reading. Let the fun begin. What happens when test doesn’t pass and it throws an exception?

func(bool): # @func(bool)
        ...
        test byte ptr [rbp - 1], 1
        je .LBB0_3 # not jumping this time
        mov edi, 4 # asking __cxa_allocate_exception to allocate one with 4 bytes (to store "1")
        call __cxa_allocate_exception
        ...

What is this __cxa_allocate_exception and what does it do? To understand it, we need to learn a bit about the Itanium C++ ABI.

Itanium C++ ABI

The Itanium C++ ABI is a language specific (obviously) ABI.

The Itanium C++ ABI is an ABI for C++.  As an ABI, it gives precise rules for implementing the language, ensuring that separately-compiled parts of a program can successfully interoperate.

Today we will focus on the exception handling section. It’s part of C++ ABI because we want to be able to catch an exception thrown from a separately compiled library. There needs to be a contract between binaries describing how exceptions should be handled. It has a set of APIs which must be available on any Itanium compatible platforms.

Base APIs

The following _Unwind_* base APIs are language agnostic. They help perform basic functionalities of exception handling – e.g. actually unwind the stack. Although they have C interfaces, any language conforming to the System V AMD64 ABI can invoke these functions just fine.

  _Unwind_RaiseException,
  _Unwind_Resume,
  _Unwind_DeleteException,
  _Unwind_GetGR,
  _Unwind_SetGR,
  _Unwind_GetIP,
  _Unwind_SetIP,
  _Unwind_GetRegionStart,
  _Unwind_GetLanguageSpecificData,
  _Unwind_ForcedUnwind

libunwind is the most popular (mostly language agnostic) implementation of these APIs (and more). There are actually two implementations of libunwind. One is the “official“/nongnu continuation of HP’s libunwind. The other is from LLVM, which Apple made most of the contributions. LLVM’s libunwind focuses on implementing the base APIs listed above. I will use code from LLVM’s libunwind implementation for this post.

The following is the actual implementation of _Unwind_RaiseException from libunwind.

/// Called by __cxa_throw. Only returns if there is a fatal error.
_LIBUNWIND_EXPORT _Unwind_Reason_Code
_Unwind_RaiseException(_Unwind_Exception *exception_object) {
  _LIBUNWIND_TRACE_API("_Unwind_RaiseException(ex_obj=%p)",
                       (void *)exception_object);
  unw_context_t uc;
  unw_cursor_t cursor;
  __unw_getcontext(&uc);

  // Mark that this is a non-forced unwind, so _Unwind_Resume()
  // can do the right thing.
  exception_object->private_1 = 0;
  exception_object->private_2 = 0;

  // phase 1: the search phase
  _Unwind_Reason_Code phase1 = unwind_phase1(&uc, &cursor, exception_object);
  if (phase1 != _URC_NO_REASON)
    return phase1;

  // phase 2: the clean up phase
  return unwind_phase2(&uc, &cursor, exception_object);
}

You might have heard about personality routine, which is basically a set of callbacks that are language specific which an unwind library can invoke. For libstdc++ and libc++, the personality routine is called __gxx_personality_v0.

Unwind process

As you can see from the above code snippet. _Unwind_RaiseException has two phases – unwind_phase1 and unwind_phase2, dictated by the Itanium ABI. The first phase walks the stack and tries to find a frame that can handle the exception. It just tries to find a handler in phase one. No action is performed. The stack is not yet actually unwound. If no handler was found, the program should terminate. The second phase performs the actual unwind and cleanup. Here’s the code in libunwind about what happens when the first phase fails to find a handle.

/// Called by \c __cxa_throw(). Only returns if there is a fatal error.
_LIBUNWIND_EXPORT _Unwind_Reason_Code
_Unwind_RaiseException(_Unwind_Exception *exception_object) {
  _LIBUNWIND_TRACE_API("_Unwind_RaiseException(ex_obj=%p)",
                       (void *)exception_object);

  // Mark that this is a non-forced unwind, so _Unwind_Resume()
  // can do the right thing.
  memset(exception_object->private_, 0, sizeof(exception_object->private_));

  // phase 1: the search phase
  // We'll let the system do that for us.
  RaiseException(STATUS_GCC_THROW, 0, 1, (ULONG_PTR *)&exception_object);

  // If we get here, either something went horribly wrong or we reached the
  // top of the stack. Either way, let libc++abi call std::terminate().
  return _URC_END_OF_STACK;
}

So if an exception is not caught, according to the Itanium ABI, the stack is actually not unwound, which means no cleanup is performed. This can be easily verified by a simple program that throws an uncaught exception. Interestingly, C++ standard explicitly says unwind behavior in this case is implementation dependent.

an exception is thrown and not caught (it is implementation-defined whether any stack unwinding is done in this case)

Basically C++, the language, doesn’t require stack to be unwound when an exception is not caught. You can have your C++ ABI that actually unwinds the stack on uncaught exception, and still be consistent with the C++ standard. But your C++ ABI won’t be compatible with the Itanium ABI. In practice, it’s unlikely to matter but I found this difference interesting.

I wonder why the Itanium ABI designed a two-phase unwind process. The documentation says it allows other languages to support resumptive exception handling. Here Christian provided an example of what resumptive exception handling could look like. It’s not a feature that C++ supports. It sounds like a cool feature. Maybe one day it will be added to C++, as if the language is not complicated enough.

It’s also interesting that libunwind itself actually doesn’t terminate the program. It just returns _URC_END_OF_STACKand lets libc++abi call std::terminate.

__cxa_* C++ APIs

libc++abi is LLVM’s implementation (libsupc++ within libstdc++ from GCC serves similar functionality) of the C++ ABI as part of the Itanium ABI. The C++ ABI covers a range of things e.g. the layout of exception objects, control transfer, etc. But we will just focus on the APIs. C++ ABI functions all start with __cxa_ prefix. If you remember from our assembly dump from the first post, we have seen a few of them already — __cxa_allocate_exception, ` cxa_throw, cxa_begin_catch and cxa_end_catch. C++ ABI is built on top of the base unwind APIs. E.g. cxa_throw calls _Unwind_RaiseException`.

It’s interesting that libc++abi has APIs such as __cxa_new_handler, that is not present in libstdc++. I found anemail from Apple explaining why it’s added. First of all, libc++abi is introduced so that libraries built with libc++ and libstdc++ can interoperate. But there’s a problem when it comes to std::set_new_handler (and a few others), where it assumes there exists a global handle. Now if a library compiled with libstdc++ calls std::set_new_handler to function A, another library compiled with libc++ calls std::set_new_handler to function B, and these two libraries are linked together. There’s no contract about how they can consolidate the conflict. One solution is to factor out the actual implementation of std::set_new_handler and link the implementation with the two libraries only once, which is essentially what Apple proposed and implemented in libc++abi.

Throwing an exception

Now we can get back to the original code where an exception is thrown.

func(bool): # @func(bool)
        ...
        test byte ptr [rbp - 1], 1
        je .LBB0_3
        mov edi, 4
        call __cxa_allocate_exception
        mov rdi, rax
        mov dword ptr [rdi], 1
        mov esi, offset typeinfo for int
        xor eax, eax
        mov edx, eax
        call __cxa_throw

It calls ` cxa_allocate_exception to get a 4-byte-sized exception object. It then initializes the higher 4 bytes of the exception object with 1, and the lower 4 bytes with typeinfo representing int – we need the typeinfo when it comes to checking if an exception can be caught by a handler or not. This is why C++ exceptions need to be copy/move constructable. When you do throw ex;, ex is _always_ moved/copied to this another exception object that's allocated by cxa_allocate_exception` first. For most exceptions, chances are it’s doing a copy.

Then it calls __cxa_throw with rdi (first function argument) set to be the initialized exception object. This kick-starts the two-phase unwind process we described earlier. It has to walk the stack at runtime, because it’s dynamic, which is slow. Because frame pointer by default is omitted, libunwind depends on information in DWARF (if you are on Linux) to unwind the stack, which means more page misses and an even slower unwind process.

So when people say “C++ exceptions are slow”, what they really meant is that throwing an exception in C++ is slow.