Google Summer of Code 2019 for GNU libmicrohttpd, final report

2019-08-22

First published at libmicrohttpd mailinglist, minor edits applied (form mostly).

Here's my summary for the work I've done during Google Summer of Code 2019 for libmicrohttpd (summerofcode project page).

As the communication with my mentor happened almost exclusively via private Email and Mumble Voicecalls, the gist of it is to optimize syscalls with focus on setsockopt optimization on various platforms (Debian Linux, FreeBSD 12-RELEASE-p3, NetBSD-CURRENT, cygwin on Windows 10). OSX/macOS is listed but was disregarded as the process would've meant losing too much time as I was simply not familiar with the process of setting up this in my test environment.

The summary and analysis of the state of syscalls can be found at d.n0.is/p/l/g/syscalls.html or checked out via mercurial:

or git if you prefer this:

The tests I wrote to analyse the sycalls focused on 4 scenarios:

  1. continuous response generation (ensure no constant setsockopt() calls during transmission)
  2. tiny response generation (fits in one packet, including header, ensure one packet and sending without corking)
  3. modest response generation (header first, then body, ensure last part of body is sent without corking)
  4. response generation using sendfile() (making sure that after last write operation the packet is sent without corking, no unnecessary setsockopts)

1) initial syscall assessment

Initially, before changing the code, it can be observed that we make too many syscalls of setsockopt() on Linux (Debian) and other platforms. The setsockopt() calls on a platform which supports MSG_MORE (Linux) are unnecessary high, and on FreeBSD TCP_NOPUSH is not used. The way the syscalls are called is unnecessary expensive.

2) approach taken, description of the changes made

I looked into the system specific tweaks we could start with. This included looking and reading into the documentation and source changelogs of the targeted Operating Systems and their TCP/IP stack implementation. It was concluded that MSG_MORE should be prefered on Linux if it exists, and get priority over any other existing tcp socket flags. TCP_NODELAY is considered for toggling Naggle's algorithm on or off. On FreeBSD I had to read into additions to the stack in the last 10+ years, and TCP_NOPUSH was found to be the closest to the Linux specific TCP_CORK. Both TCP_NOPUSH and TCP_CORK deal with not sending out partial frames if set, an optimization to consider. Through incremental testing, rewriting and reasoning about the best way the systems should be adressed, I got to the endresult.

Two helper functions were written (pre_cork_setsockopt(), post_cork_setsockopt()) to handle setsockopt() calls. pre_cork_setsockopt() is called before any send()/sendmsg() calls. post_cork_setsockopt() is called after any send()/sendmsg() calls. If the MSG_MORE option is supported on this platform, we do nothing in those functions. If the sk_cork_on boolean is already the state we want to set, we do nothing. An additional helper function was added to toggle Naggle's Algorithm, MHD_socket_set_nodelay_(). In the pre_cork_setsockopt() function, if TCP_CORK/TCP_NOPUSH/MSG_MORE do not exist on a platform, we set Naggle to on/off, in all other cases we simply set Naggle to always off. This is achieved with a third helper function, MHD_socket_set_nodelay_().

Two function were written to handle send() calls. MHD_send_on_connection_() sends a given buffer on connection and remembers the current state of the socket options. setsockopt() is only called when absolutely necessary. If the connection is using TLS, GnuTLS handles the corking. In all other cases (ie plaintext transmission) we call pre_cork_setsockopt(), followed by the send(), and post_cork_setsockopt().

MHD_send_on_connection2_() sends the header followed by the buffer on the connection given. When sendmsg() is available, sendmsg() is used to send both (header and buffer) at once and returns the number of bytes sent from both buffers. Otherwise if writev() is available, writev() is used to send both (header and buffer) at once and returns the number of bytes sent from both buffers or -1 on error. If sendmsg() or writev() are unavailable, only the header is send via MHD_send_on_connection_(). If we have writev or sendmsg, we call pre_cork_setsockopt with 'false' for the want_cork argument. If we succeeded in sending the full buffer, we call post_cork_setsockopt() again with 'false' for the want_cork argument, as we want to make sure that the OS flushes at the end.

The previously existing function for sendfile() wrapping in connection.c was moved and renamed to MHD_send_sendfile_().

The functions were then tested and replaced all previous setsockopt calls in connection.c. In connection.c, the following functions were removed:

and calls of both of them in connection.c were removed, in addition to removing calls of socket_start_no_buffering_flush() and socket_start_extra_buffering().

3) resulting syscall behavior

Overall we see an improvement to the number of times setsockopt() is called as well as to the time it is called. On Linux (Debian) setsockopt() for the measurement we did is no longer called, as expected. The only setsockopt() call logged is provided as reference. The MSG_MORE flag related code is working.

Compared to the first tests on FreeBSD in most testcases we now get no setsockopt() calls after the header.

On NetBSD the comparison before/after shows that for the sendfile() test the order of the setsockopt call changed and we have 1 call less. For the tiny response generation test we have 2 calls less to setsocktop() than before, the same applies for modest response generation.

On cygwin the visible improvements include for tiny response generation only 1 call to setsockopt() instead of 2 setsockopt() calls.

4) future work to be done

In post_cork_setsockopt() and pre_cork_setsockopt() more of the possible errno cases to catch must be handled.

In connection.c there is one last line which can be removed once the code in socket_start_no_buffering_flush() has been dealt with (FreeBSD specific, discussion about it did not conclude in my project time).

5) additional notes

Even though some of the flags used are new'ish, Steven's "Unix Network Programming" and the TCP/IP Illustrated volumes were a great help for this work in addition to the documented source code of the Operating Systems (where available). Thanks to my fellow NetBSD developers who I could ask about system and network specific features in NetBSD when the code wasn't enough.