2016-06-18

Is SIGWINCH in shells broken?

Julia Evans has written a piece about the scary properties of UNIX signals. I always figured it's safe enough if you perform an atomic operation like assigning a boolean or integer and handling the signal in regular code.

Take my signal handler for the powerd++ daemon:

/**
 * Sets g.signal, terminating the main loop.
 */
void signal_recv(int const signal) {
        g.signal = signal;
}

It doesn't get much easier than that.

Signals in Shells

Bourne-style shells like the Almquist Shell or BASH offer signal handling through the trap builtin. The following is the relevant manual section of the Almquist Shell (i.e. FreeBSD's /bin/sh):

     trap [action] signal ...

     trap -l
             Cause the shell to parse and execute action when any specified
             signal is received.  The signals are specified by name or number.
             In addition, the pseudo-signal EXIT may be used to specify an
             action that is performed when the shell terminates.  The action
             may be an empty string or a dash (‘-’); the former causes the
             specified signal to be ignored and the latter causes the default
             action to be taken.  Omitting the action is another way to
             request the default action, for compatibility reasons this usage
             is not recommended though.  In a subshell or utility environment,
             the shell resets trapped (but not ignored) signals to the default
             action.  The trap command has no effect on signals that were
             ignored on entry to the shell.

             Option -l causes the trap command to display a list of valid sig‐
             nal names.

If you bothered to read that you may have noticed, that there are no limits placed on what an action constitutes. This is because the shell handles all the dangerous bits for you (unless you activate the trapsync option, which you shouldn't unless you like to see your shell scripts segfault).

This is necessary, because the shell is an interpreter and thus there are no commands that are really safe to perform in a trap. The shell sanitises signals for you by safely interrupting and resuming builtin commands and simply waiting for non-builtin commands to complete before performing your action.

Now you might have a long running command that you may want to be able to interrupt somehow:

trap 'echo Interrupted by signal;exit 1' INT HUP TERM
if my_longwinded_command; then
    do_something
fi

If my_longwinded_command is not a builtin or function, the trap does not spring until the command completed. The way to handle this without trapsync is the following:

my_longwinded_command &
trap "kill $!;echo Interrupted by signal;exit 1" INT HUP TERM
if wait; then
    trap "echo Interrupted by signal;exit 1" INT HUP TERM
    do_something
fi
trap "echo Interrupted by signal;exit 1" INT HUP TERM

The trick here is that we emulate the sequential behaviour with the wait builtin, which the shell can safely interrupt to perform action.

Of course in the real world you want to handle the whole affair more gracefully, to avoid all this copy and paste.

There also is a small race between the command terminating and changing the trap, where the kill command can be called without there being a process to kill. There is not graceful way to handle this other than suppressing the output of the kill command.

So this is basically it we're all set up to handle signals in scripts.

Except for …

SIGWINCH, which silently crashes shells that try to handle it. E.g.:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
…
# Handle window changes
if [ -n "$winch_trapped" ]; then
    # Reinstall the trap
    winch_trapped=
    trap "trap '' WINCH;winch_trapped=1" WINCH
    # Redraw the current output
    redraw
fi

This should be fairly safe, but it crashes frequently (not always, though). It's a proven pattern for all other signals that I normally handle.

So what's wrong?