Google Summer of Code 2019 for GNU libmicrohttpd, final report

2019-08-22

First published at libmicrohttpd mailinglist, minor edits applied (form mostly).

Here's my summary for the work I've done during Google Summer of Code 2019 for libmicrohttpd (summerofcode project page).

As the communication with my mentor happened almost exclusively via private Email and Mumble Voicecalls, the gist of it is to optimize syscalls with focus on setsockopt optimization on various platforms (Debian Linux, FreeBSD 12-RELEASE-p3, NetBSD-CURRENT, cygwin on Windows 10). OSX/macOS is listed but was disregarded as the process would've meant losing too much time as I was simply not familiar with the process of setting up this in my test environment.

The summary and analysis of the state of syscalls can be found at d.n0.is/p/l/g/syscalls.html or checked out via mercurial:

  • hg clone https://c.n0.is/libmicrohttpd/gsoc2019

or git if you prefer this:

  • git clone https://gnunet.org/git/libmicrohttpd-gsoc2019.git

The tests I wrote to analyse the sycalls focused on 4 scenarios:

  1. continuous response generation (ensure no constant setsockopt() calls during transmission)
  2. tiny response generation (fits in one packet, including header, ensure one packet and sending without corking)
  3. modest response generation (header first, then body, ensure last part of body is sent without corking)
  4. response generation using sendfile() (making sure that after last write operation the packet is sent without corking, no unnecessary setsockopts)

1) initial syscall assessment

Initially, before changing the code, it can be observed that we make too many syscalls of setsockopt() on Linux (Debian) and other platforms. The setsockopt() calls on a platform which supports MSG_MORE (Linux) are unnecessary high, and on FreeBSD TCP_NOPUSH is not used. The way the syscalls are called is unnecessary expensive.

  • Linux (Debian)
    • Continuous Response Generation test shows a pattern of setsockopt, sendto, setsockopt, setsockopt, setsockopt, setsockopt, setsockopt, setsockopt. After this the body of the message is send, and we see 2 more setsockopt() calls at the end.
      (9 setsockopt calls)
    • Tiny Response Generation test starts with 1 setsockopt() call after receiving the GET request (TCP_CORK). After sending the header, we see 6 setsockopt() calls. Again after sending the body see 2 final setsockopt() calls.
      (9 setsockopt calls)
    • Modest Response Generation test starts with 1 setsockopt() call after receiving the GET request. We send the header. 6 calls to setsockopt() follow. The body is send, and now we get 2 more setsockopt() calls at the end.
      (9 setsockopt calls)
    • Response Generation using sendfile() test starts with 1 call to setsockopt() after the GET request is received, immediately before the header is send. Then we see 6 calls to setsockopt(). sendfile() is called, setsockopt() is called with TCP_NODELAY. The filedescriptor is closed. setsockopt() is called with TCP_NODELAY.
      (9 setsockopt calls)
  • FreeBSD
    • Continuous Response Generation test shows that setsockopt() is called after receiving the GET request, 1 call immediately before sending the header, 1 call immediately after sending the header (TCP_NODELAY). The body is send, and no further setsockopt() call is made.
      (3 setsockopt calls)
    • Tiny Response Generation test has a comparable pattern: after receiving the GET request, 1 call is made to setsockopt() immediately before calling sendto() on the header and 1 call is made to setsockopt() immediately after sending the header. No further setsockopt() calls are made.
      (2 setsockopt calls)
    • Modest Response Generation test shows a comparable pattern: after receiving the GET request, 1 call is made to setsockopt() immediately before calling sendto() on the header and 1 call is made to setsockopt() immediately after sending the header. No further setsockopt() calls are made.
      (2 setsockopt calls)
    • Response Generation using sendfile() result is comparable to the 3 tests run before it.
      (2 setsockopt calls)
  • NetBSD
    • Continuous Response Generation test shows 1 call to setsockopt() before sending the header, and 1 call to setsockopt() after sending the header.
      (2 setsockopt calls)
    • Tiny Response Generation test shows that after receiving the GET request, we see 1 call to setsockopt, before sending the header. Immediately after the header see 2 calls to setsockopt(). After the body is send we see 1 more call to setsockopt().
      (4 setsockopt calls)
    • Modest Response Generation test shows that after receiving the GET request, we see 1 setsockopt() call which is followed by sending the header. This is followed by 2 calls to setsockopt(). After this, the body is send, followed by another call to setsockopt().
      (4 setsockopt calls)
    • Response Generation using sendfile() shows that after receiving the GET request, the file with the content "a" is created. We see 1 call to setsockopt() just before the header is send. Immediately after this we see 1 more setsockopt call. After a call to pread() we see another setsockopt call followed by a sendto of "a".
      (3 setsockopt calls)
  • cygwin x64 (under Windows 10)
    • Continuous Response Generation test shows 1 call of main to setsockopt() in the beginning. After this, MHD-single calls setsockopt 2 times (setsockopt, followed by cygwin_send, followed by another setsockopt call). The pattern in cygwin is comparable to the ktruss log in NetBSD, where we see a setsockopt, sendto, setsockopt at the beginning of the log.
      (3 setsockopt calls)
    • Tiny Response Generation test shows 1 call to setsockopt. This is followed by a send of the size 99 (the header). Then follows another setsockopt, followed by another call to send (the body).
      (2 setsockopt calls)
    • Modest Response Generation test shows 2 calls to setsockopt() related to the socket we use, each one of them before the respective send() is called.
      (2 setsockopt calls)
    • Response Generation using sendfile() shows 1 call to setsockopt(), followed by a send() with size 99 (the header). This is followed by 1 call to setsockopt(), and one more call to send() with size 1 (the body).
      (2 setsockopt calls)

2) approach taken, description of the changes made

I looked into the system specific tweaks we could start with. This included looking and reading into the documentation and source changelogs of the targeted Operating Systems and their TCP/IP stack implementation. It was concluded that MSG_MORE should be prefered on Linux if it exists, and get priority over any other existing tcp socket flags. TCP_NODELAY is considered for toggling Naggle's algorithm on or off. On FreeBSD I had to read into additions to the stack in the last 10+ years, and TCP_NOPUSH was found to be the closest to the Linux specific TCP_CORK. Both TCP_NOPUSH and TCP_CORK deal with not sending out partial frames if set, an optimization to consider. Through incremental testing, rewriting and reasoning about the best way the systems should be adressed, I got to the endresult.

Two helper functions were written (pre_cork_setsockopt(), post_cork_setsockopt()) to handle setsockopt() calls. pre_cork_setsockopt() is called before any send()/sendmsg() calls. post_cork_setsockopt() is called after any send()/sendmsg() calls. If the MSG_MORE option is supported on this platform, we do nothing in those functions. If the sk_cork_on boolean is already the state we want to set, we do nothing. An additional helper function was added to toggle Naggle's Algorithm, MHD_socket_set_nodelay_(). In the pre_cork_setsockopt() function, if TCP_CORK/TCP_NOPUSH/MSG_MORE do not exist on a platform, we set Naggle to on/off, in all other cases we simply set Naggle to always off. This is achieved with a third helper function, MHD_socket_set_nodelay_().

Two function were written to handle send() calls. MHD_send_on_connection_() sends a given buffer on connection and remembers the current state of the socket options. setsockopt() is only called when absolutely necessary. If the connection is using TLS, GnuTLS handles the corking. In all other cases (ie plaintext transmission) we call pre_cork_setsockopt(), followed by the send(), and post_cork_setsockopt().

MHD_send_on_connection2_() sends the header followed by the buffer on the connection given. When sendmsg() is available, sendmsg() is used to send both (header and buffer) at once and returns the number of bytes sent from both buffers. Otherwise if writev() is available, writev() is used to send both (header and buffer) at once and returns the number of bytes sent from both buffers or -1 on error. If sendmsg() or writev() are unavailable, only the header is send via MHD_send_on_connection_(). If we have writev or sendmsg, we call pre_cork_setsockopt with 'false' for the want_cork argument. If we succeeded in sending the full buffer, we call post_cork_setsockopt() again with 'false' for the want_cork argument, as we want to make sure that the OS flushes at the end.

The previously existing function for sendfile() wrapping in connection.c was moved and renamed to MHD_send_sendfile_().

The functions were then tested and replaced all previous setsockopt calls in connection.c. In connection.c, the following functions were removed:

  • socket_start_normal_buffering
  • socket_start_no_buffering

and calls of both of them in connection.c were removed, in addition to removing calls of socket_start_no_buffering_flush() and socket_start_extra_buffering().

3) resulting syscall behavior

Overall we see an improvement to the number of times setsockopt() is called as well as to the time it is called. On Linux (Debian) setsockopt() for the measurement we did is no longer called, as expected. The only setsockopt() call logged is provided as reference. The MSG_MORE flag related code is working.

Compared to the first tests on FreeBSD in most testcases we now get no setsockopt() calls after the header.

On NetBSD the comparison before/after shows that for the sendfile() test the order of the setsockopt call changed and we have 1 call less. For the tiny response generation test we have 2 calls less to setsocktop() than before, the same applies for modest response generation.

On cygwin the visible improvements include for tiny response generation only 1 call to setsockopt() instead of 2 setsockopt() calls.

  • Linux (Debian)
    • continuous response generation:
      setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
                  
      before header is send. sendto() with MSG_NOSIGNAL|MSG_MORE for header.
      sendto() with MSG_NOSIGNAL for the body.
      No other setsockopt() calls.
    • tiny response generation:
      setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
                  
      before the header is send, no other setsockopt() calls.
    • modest response generation:
      setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
                  
      before the header is send, no further setsockopt() calls.
    • response generation using sendfile():
      setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
                  
      before the header is send, no further setsockopt() calls.
  • FreeBSD
    • continuous response generation:
      setsockopt(4,IPPROTO_TCP,TCP_NODELAY,0x7fffdfffdecc,4) = 0 (0x0)
      setsockopt(4,SOL_SOCKET,SO_NOSIGPIPE,0x800249f94,4) = 0 (0x0)
                  
      Before header is send:
      setsockopt(4,IPPROTO_TCP,TCP_FASTOPEN_MIN_COOKIE_LEN,0x7fffdfffde5c,4) = 0 (0x0)
                  
      header:
      sendto(4,"HTTP/1.1 200 OK\r\nConnection: K"...,108,MSG_NOSIGNAL,NULL,0) = 108 (0x6c)
                  
      followed by a pattern of:
      sendto(4,"1\r\nb\r\n",6,MSG_NOSIGNAL,NULL,0) = 6 (0x6)
      poll({ 3/POLLIN 4/POLLPRI|POLLOUT|POLLRDBAND },2,-1) = 1 (0x1)
                  
      Ends with another setsockopt call:
      setsockopt(4,IPPROTO_TCP,TCP_FASTOPEN_MIN_COOKIE_LEN,0x7fffdfffde5c,4) = 0 (0x0)
                  
    • tiny response generation:
      setsockopt(4,IPPROTO_TCP,TCP_NODELAY,0x7fffdfffdecc,4) = 0 (0x0)
      setsockopt(4,SOL_SOCKET,SO_NOSIGPIPE,0x800249f94,4) = 0 (0x0)
                  
      before the header is send. No further setsockopt() calls.
    • modest response generation:
      setsockopt(4,IPPROTO_TCP,TCP_NODELAY,0x7fffdfffdecc,4) = 0 (0x0)
      setsockopt(4,SOL_SOCKET,SO_NOSIGPIPE,0x800249f94,4) = 0 (0x0)
                  
      before the header is send. No further setsockopt() calls.
    • response generation using sendfile():
      setsockopt(4,IPPROTO_TCP,TCP_NODELAY,0x7fffdfffdecc,4) = 0 (0x0)
      setsockopt(4,SOL_SOCKET,SO_NOSIGPIPE,0x800249f94,4) = 0 (0x0)
                  
      before the header is send.
      setsockopt(4,IPPROTO_TCP,TCP_FASTOPEN_MIN_COOKIE_LEN,0x7fffdfffde5c,4) = 0 (0x)
                  
      before the HTTP/1.1 200 OK is send.
  • NetBSD
    • continuous response generation:
      before receiving the GET request:
      setsockopt(0xa, 0xffff, 0x800, 0x7e25e041a1bc, 0x4) = 0
                  
      After receiving the GET request:
      setsockopt(0xa, 0x6, 0x1, 0x7e25dcbffec8, 0x4) = 0
      sendto(0xa, 0x7e25e067f800, 0x6c, 0x400, 0, 0) = 108
      "HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nTransfer-Encoding: chunk"
      sendto(0xa, 0x7e25e067f807, 0x6, 0x400, 0, 0) = 6
      "1\r\nb\r\n"
      setsockopt(0xa, 0x6, 0x1, 0x7e25dcbffecc, 0x4) = 0
      poll(0x7e25e067a000, 0x2, 0xffffffff) = 1
      sendto(0xa, 0x7e25e067f807, 0x6, 0x400, 0, 0) = 6
      "1\r\nb\r\n"
      poll(0x7e25e067a000, 0x2, 0xffffffff) = 1
      sendto(0xa, 0x7e25e067f807, 0x6, 0x400, 0, 0) = 6
      "1\r\nb\r\n"
                  
      and so forth. At the end one last setsockopt() call:
      setsockopt(0xa, 0x6, 0x1, 0x7e25dcbffec8, 0x4) = 0
                  
    • tiny response generation:
      before GET request:
      setsockopt(0xa, 0xffff, 0x800, 0x7986b2e1a1bc, 0x4) = 0
                  
      No other setsockopt() call.
    • modest response generation:
      before receiving the GET request:
      setsockopt(0xa, 0xffff, 0x800, 0x7efa1c01a1bc, 0x4) = 0
                  
      No other setsockopt() calls.
    • response generation using sendfile():
      before receiving the GET request:
      setsockopt(0xa, 0xffff, 0x800, 0x723db1e1a1bc, 0x4) = 0
                  
      After receiving the GET request:
      setsockopt(0xa, 0x6, 0x1, 0x723dae5ffec8, 0x4) = 0
      sendto(0xa, 0x723db21052c0, 0x63, 0x400, 0, 0) = 99
      "HTTP/1.1 200 OK\r\nConnection: Keep-Alive\r\nContent-Length: 1\r\nDat"
      pread(0xb, 0x723db20fb0a0, 0x1, 0, 0) = 1
      "a"
      sendto(0xa, 0x723db20fb0a0, 0x1, 0x400, 0, 0) = 1
      "a"
      setsockopt(0xa, 0x6, 0x1, 0x723dae5ffecc, 0x4) = 0
      close(0xb) = 0
                  
  • cygwin x64 (under Windows 10)
    • continuous response generation:
      [MHD-single] inc 1643 __set_winsock_errno: setsockopt:1689 - winsock error 10042 -> errno 109
      [MHD-single] inc 1643 cygwin_setsockopt: -1 = setsockopt(6, 6, 0x4, 0xFFDFCB5C, 4), errno 109
      [MHD-single] inc 1643 cygwin_send: 108 = send(6, 0x6000790A0, 108, 0x20)
      [MHD-single] inc 1643 cygwin_send: 6 = send(6, 0x6000790A7, 6, 0x20)
      [MHD-single] inc 1643 __set_errno: void __set_winsock_errno(const char*, int):203 setting errno 109
      [MHD-single] inc 1643 __set_winsock_errno: setsockopt:1689 - winsock error 10042 -> errno 109
      [MHD-single] inc 1643 cygwin_setsockopt: -1 = setsockopt(6, 6, 0x4, 0xFFDFCB58, 4), errno 109
                  
      and at the end:
      [MHD-single] inc 1643 __set_winsock_errno:setsockopt:1689 - winsock error 10042 -> errno 109
      [MHD-single] inc 1643 cygwin_setsockopt: -1 = setsockopt(6, 6, 0x4, 0xFFDFCB5C, 4), errno 109
                  
    • tiny response generation:
      [MHD-single] trg 1653 fhandler_socket_inet::setsockopt:setsockopt optval=1
      [MHD-single] trg 1653 cygwin_setsockopt: 0 = setsockopt(6, 6, 0x1, 0xFFDFCBFC, 4)
                  
    • modest response generation:
      [main] mrg 1649 fhandler_socket_inet::setsockopt:setsockopt optval=1
      [main] mrg 1649 cygwin_setsockopt: 0 = setsockopt(5, 65535, 0x4, 0xFFFFCA10, 4)
      [MHD-single] mrg 1649 fhandler_socket_inet::setsockopt:setsockopt optval=1
      [MHD-single] mrg 1649 cygwin_setsockopt: 0 = setsockopt(6, 6, 0x1, 0xFFDFCBFC, 4)
                  
    • response generation using sendfile():
      [MHD-single] response_generation_sendfile 1657 __set_winsock_errno: setsockopt:1689 - winsock error 10042 -> errno 109
      [MHD-single] response_generation_sendfile 1657 cygwin_setsockopt: -1 = setsockopt(6, 6, 0x4, 0xFFDFCB5C, 4), errno 109
      [MHD-single] response_generation_sendfile 1657 cygwin_send: 99 = send(6, 0x6000790C0, 99, 0x20)
      [MHD-single] response_generation_sendfile 1657 fhandler_disk_file::prw_open: 0x0 = NtOpenFile (0x264, 0x80100000, ??\C:\cygwin64\home\ng0\src\gsoc2019\a.txt, io, 0x7, 0x4020)
      [MHD-single] response_generation_sendfile 1657 fhandler_disk_file::pread: 1 = pread(0x60007D148, 1, 0, 0x0)
      [MHD-single] response_generation_sendfile 1657 pread: 1 = pread(7, 0x60007D148, 1, 0)
      [MHD-single] response_generation_sendfile 1657 cygwin_send: 1 = send(6, 0x60007D148, 1, 0x20)
      [MHD-single] response_generation_sendfile 1657 __set_errno: void __set_winsock_errno(const char*, int):203 setting errno 109
      [MHD-single] response_generation_sendfile 1657 __set_winsock_errno: setsockopt:1689 - winsock error 10042 -> errno 109
      [MHD-single] response_generation_sendfile 1657 cygwin_setsockopt: -1 = setsockopt(6, 6, 0x4, 0xFFDFCB58, 4), errno 109
      [MHD-single] response_generation_sendfile 1657 close: close(7)
                  

4) future work to be done

In post_cork_setsockopt() and pre_cork_setsockopt() more of the possible errno cases to catch must be handled.

In connection.c there is one last line which can be removed once the code in socket_start_no_buffering_flush() has been dealt with (FreeBSD specific, discussion about it did not conclude in my project time).

5) additional notes

Even though some of the flags used are new'ish, Steven's "Unix Network Programming" and the TCP/IP Illustrated volumes were a great help for this work in addition to the documented source code of the Operating Systems (where available). Thanks to my fellow NetBSD developers who I could ask about system and network specific features in NetBSD when the code wasn't enough.