Angry Swarm: July 2016

Last month I asked, is SIGWINCH in shells broken?

I explained how the shell allows you to create signal handlers, but takes care of all the dangerous bits of signal handling for you, so unlike in C or C++ you can run any code from within a signal handler. One exception was SIGWINCH, a signal that caused my shell script to terminate about 50% of the time.

I concluded my article with an example of how I handle the signal, which was met with some people telling me that they cannot reproduce my problem:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
…
# Handle window changes
if [ -n "$winch_trapped" ]; then
    # Reinstall the trap
    winch_trapped=
    trap "trap '' WINCH;winch_trapped=1" WINCH
    # Redraw the current output
    redraw
fi

As it would turn out, I should have RTFM more carefully.

A better example

This code snippet, as it turned out, did not contain the actual problem. It was implied (but not stated) that the signal handling code happens in some kind of loop.

Said loop looks somewhat like this:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
while read -r line; do
    # Handle window changes
    if [ -n "$winch_trapped" ]; then
        # Reinstall the trap
        winch_trapped=
        trap "trap '' WINCH;winch_trapped=1" WINCH
        # Redraw the current output
        redraw
    fi
    case "$line" in
    …
    esac
done

Tracking it Down

This, it turned out, was important.

At this point, I would usually describe the debugging process, but by now I do not remember everything that I've done.

Fixing it

In the end it turned out that there is a difference in the behaviour between ash and bash. The explanation can be found in the manual page of ash:

             The exit status is 0 on success, 1 on end of file, between 2 and
             128 if an error occurs and greater than 128 if a trapped signal
             interrupts read.

So in case of a signal trap, ash executes the trap and has read return a value greater 128, which is equivalent to read(2) setting errno=EINTR.

This behaviour is not shared by bash, which seems to resume an interrupted read transparently. It also has different rules for return values:

                           … The  return  code  is zero, unless end-of-file is
              encountered, read times out (in which case the  return  code  is
              greater  than 128), a variable assignment error (such as assign-
              ing to a readonly variable) occurs, or an invalid file  descrip-
              tor is supplied as the argument to -u.

So unless the -t flag is used bash's read does not return values greater than 128, which makes developing working code easy:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
while true; do
    read -r line
    retval=$?
    if [ $retval -gt 128 ]; then
        # Resume interrupted read
        continue
    elif [ $retval -ne 0 ]; then
        # Read failed
        break
    fi
    # Handle window changes
    if [ -n "$winch_trapped" ]; then
        # Reinstall the trap
        winch_trapped=
        trap "trap '' WINCH;winch_trapped=1" WINCH
        # Redraw the current output
        redraw
    fi
    case "$line" in
    …
    esac
done

Somewhere there is a lesson to learn in there. It is mostly in the missing part about how to debug this kind of problem, though.

Angry Swarm

2016-07-29

2016-07-06

How to handle SIGWINCH in an almquist shell

A better example

Tracking it Down

Fixing it

Links