Angry Swarm: 2016

2016-07-29

2016-07-06

How to handle SIGWINCH in an almquist shell

Last month I asked, is SIGWINCH in shells broken?

I explained how the shell allows you to create signal handlers, but takes care of all the dangerous bits of signal handling for you, so unlike in C or C++ you can run any code from within a signal handler. One exception was SIGWINCH, a signal that caused my shell script to terminate about 50% of the time.

I concluded my article with an example of how I handle the signal, which was met with some people telling me that they cannot reproduce my problem:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
…
# Handle window changes
if [ -n "$winch_trapped" ]; then
    # Reinstall the trap
    winch_trapped=
    trap "trap '' WINCH;winch_trapped=1" WINCH
    # Redraw the current output
    redraw
fi

As it would turn out, I should have RTFM more carefully.

A better example

This code snippet, as it turned out, did not contain the actual problem. It was implied (but not stated) that the signal handling code happens in some kind of loop.

Said loop looks somewhat like this:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
while read -r line; do
    # Handle window changes
    if [ -n "$winch_trapped" ]; then
        # Reinstall the trap
        winch_trapped=
        trap "trap '' WINCH;winch_trapped=1" WINCH
        # Redraw the current output
        redraw
    fi
    case "$line" in
    …
    esac
done

Tracking it Down

This, it turned out, was important.

At this point, I would usually describe the debugging process, but by now I do not remember everything that I've done.

Fixing it

In the end it turned out that there is a difference in the behaviour between ash and bash. The explanation can be found in the manual page of ash:

             The exit status is 0 on success, 1 on end of file, between 2 and
             128 if an error occurs and greater than 128 if a trapped signal
             interrupts read.

So in case of a signal trap, ash executes the trap and has read return a value greater 128, which is equivalent to read(2) setting errno=EINTR.

This behaviour is not shared by bash, which seems to resume an interrupted read transparently. It also has different rules for return values:

                           … The  return  code  is zero, unless end-of-file is
              encountered, read times out (in which case the  return  code  is
              greater  than 128), a variable assignment error (such as assign-
              ing to a readonly variable) occurs, or an invalid file  descrip-
              tor is supplied as the argument to -u.

So unless the -t flag is used bash's read does not return values greater than 128, which makes developing working code easy:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
while true; do
    read -r line
    retval=$?
    if [ $retval -gt 128 ]; then
        # Resume interrupted read
        continue
    elif [ $retval -ne 0 ]; then
        # Read failed
        break
    fi
    # Handle window changes
    if [ -n "$winch_trapped" ]; then
        # Reinstall the trap
        winch_trapped=
        trap "trap '' WINCH;winch_trapped=1" WINCH
        # Redraw the current output
        redraw
    fi
    case "$line" in
    …
    esac
done

Somewhere there is a lesson to learn in there. It is mostly in the missing part about how to debug this kind of problem, though.

2016-06-18

Is `SIGWINCH` in shells broken?

Julia Evans has written a piece about the scary properties of UNIX signals. I always figured it's safe enough if you perform an atomic operation like assigning a boolean or integer and handling the signal in regular code.

Take my signal handler for the powerd++ daemon:

/**
 * Sets g.signal, terminating the main loop.
 */
void signal_recv(int const signal) {
        g.signal = signal;
}

It doesn't get much easier than that.

Signals in Shells

Bourne-style shells like the Almquist Shell or BASH offer signal handling through the trap builtin. The following is the relevant manual section of the Almquist Shell (i.e. FreeBSD's /bin/sh):

     trap [action] signal ...

     trap -l
             Cause the shell to parse and execute action when any specified
             signal is received.  The signals are specified by name or number.
             In addition, the pseudo-signal EXIT may be used to specify an
             action that is performed when the shell terminates.  The action
             may be an empty string or a dash (‘-’); the former causes the
             specified signal to be ignored and the latter causes the default
             action to be taken.  Omitting the action is another way to
             request the default action, for compatibility reasons this usage
             is not recommended though.  In a subshell or utility environment,
             the shell resets trapped (but not ignored) signals to the default
             action.  The trap command has no effect on signals that were
             ignored on entry to the shell.

             Option -l causes the trap command to display a list of valid sig‐
             nal names.

If you bothered to read that you may have noticed, that there are no limits placed on what an action constitutes. This is because the shell handles all the dangerous bits for you (unless you activate the trapsync option, which you shouldn't unless you like to see your shell scripts segfault).

This is necessary, because the shell is an interpreter and thus there are no commands that are really safe to perform in a trap. The shell sanitises signals for you by safely interrupting and resuming builtin commands and simply waiting for non-builtin commands to complete before performing your action.

Now you might have a long running command that you may want to be able to interrupt somehow:

trap 'echo Interrupted by signal;exit 1' INT HUP TERM
if my_longwinded_command; then
    do_something
fi

If my_longwinded_command is not a builtin or function, the trap does not spring until the command completed. The way to handle this without trapsync is the following:

my_longwinded_command &
trap "kill $!;echo Interrupted by signal;exit 1" INT HUP TERM
if wait; then
    trap "echo Interrupted by signal;exit 1" INT HUP TERM
    do_something
fi
trap "echo Interrupted by signal;exit 1" INT HUP TERM

The trick here is that we emulate the sequential behaviour with the wait builtin, which the shell can safely interrupt to perform action.

Of course in the real world you want to handle the whole affair more gracefully, to avoid all this copy and paste.

There also is a small race between the command terminating and changing the trap, where the kill command can be called without there being a process to kill. There is not graceful way to handle this other than suppressing the output of the kill command.

So this is basically it we're all set up to handle signals in scripts.

Except for …

SIGWINCH, which silently crashes shells that try to handle it. E.g.:

# Record shell size changes
trap "trap '' WINCH;winch_trapped=1" WINCH
…
# Handle window changes
if [ -n "$winch_trapped" ]; then
    # Reinstall the trap
    winch_trapped=
    trap "trap '' WINCH;winch_trapped=1" WINCH
    # Redraw the current output
    redraw
fi

This should be fairly safe, but it crashes frequently (not always, though). It's a proven pattern for all other signals that I normally handle.

So what's wrong?

2016-04-07

powerd++: Better CPU Clock Control for FreeBSD

Setting of P-States (power states a.k.a. steppings) on FreeBSD is managed by powerd(8). It has been with us since 2005, a time when the Pentium-M single-core architecture was the cutting edge choice for notebooks and dual-core just made its way to the desktop.

That is not to say that multi-core architectures were not considered when powerd was designed, but as the number of cores grows and hyper-threading has made its way onto notebook CPUs, powerd falls short.

Incentive

Don't you know it? You sit at your desk, reading technical documentation, occasionally scrolling or clicking on the next page link. The only (interactive) programs running are your web browser, an e-mail client and a couple of terminals waiting for input. There is a constant fan noise, which occasionally picks up for no apparent reason, making it a million times more annoying.

You can't work like this!

You start looking at the load, which is low but not minuscule. In the age of IMAP and node.js web browsers and e-mail clients are always a little busy. Still this is not enough to explain the fan noise.

You're running powerd to reduce your energy footprint (for various reasons), or are you? Yes you are. So you start monitoring dev.cpu.0.freq and it turns out your CPU clock is stuck at maximum like the speedometer of an adrenaline junkie with a death wish.

Something is wrong, your 15% to 30% load are way below the 50% default clock down threshold of powerd. You start digging, thinking you can tune powerd to do the right thing. Turns out you can't.

An Introduction to `powerd`

The following illustration shows powerd's operation on a dual-CPU system with two cores and hyper-threading each. That is not a realistic system today, but it saves space in the illustration and contains all the cases that need to be covered.

Note that …

… the sysctl(3) interface flattens the architecture of the CPUs into a list of pipelines, each presented as individual CPUs.
… powerd has the first CPU hard coded as the one controlling the clock frequency for all cores.
… powerd uses the sum of all loads to control the clock frequency.

Powerd using the sum of all loads to rate the overall load of the system allows single threaded loads to trigger higher P-States but comes at the cost of triggering high P-States with low distributed loads. The problem grows with the number of available cores. In the illustrated systems a mean load of 12.5% results in a 100% load rating. The same applies to a single quad-core CPU with hyper-threading.

Another problem resulting from this approach is that the optimal boundaries for the hysteresis changes with the number of cores. Also, to protect single core loads, powerd only permits boundaries from 0% to 100%. This results in powerd changing into the highest P-State at the drop of a needle and only clocking down if the load is close to 0.

The Design of `powerd++`

The powerd++ design has three significant differences. The way it manages the CPUs/cores/threads presented through the sysctl interface, the way that load is calculated and the way the target frequency is determined.

During its initialisation phase powerd++ assigns a frequency controlling core to each core, grouping them by the core that offers the handle to change the clock frequency. Unlike shown in the following illustration, all cores will always be controlled by dev.cpu.0, because the cpufreq(4) driver only supports global P-State changes. But powerd++ is built unaware of this limitation and will perform fine grained control the moment the driver offers it.

To rate the load within a core group, each core determines its own load and then passes it to the controlling core. The controlling core uses the maximum of the loads in the group as the group load. This approach allows single threaded applications to cause high load ratings (i.e. up to 100%), but having small loads on all cores in a group still results in a small load rating. Another advantage of this design is that load ratings always stay within the 0% to 100% range. Thus the same settings (including the defaults) work equally well for any number of cores.

Instead of using a hysteresis to decide whether the clock frequency should be increased, lowered or stay the same, powerd++ uses a target load to determine the frequency at which the current load would have rated as the target load. This approach results in quick frequency changes in either direction. E.g. given a target of 50% and a current load of 100% the new clock frequency would be twice the current frequency. To reduce sensitivity to signal noise more than two samples (5 by default) can be collected. This works as a low pass filter but is less damaging to the responsiveness of the system than increasing the polling interval.

Resources

The code is on github. A FreeBSD port is available as sysutils/powerdxx.

Afterthoughts

My experience in automotive and race car engineering came in handy. If your noise filter is not in O(1) (per frame), you're doing it wrong. If you have one control for many inputs a maximum or minimum are usually the right choice, the sum barely is. E.g. if you have 3 sensors that report 62°C, 74°C and 96°C, you want to adjust your coolant throughput to 96°C, not 232°C.

I hope that powerd++ will be widely used (within the community) and inspire the maintainers of cpufreq(4) to add support for per-CPU frequency controls.

TODOs

Currently the power source detection depends on ACPI, I need to implement something similar for older and non-x86/amd64 systems. Currently those just fall back to the unknown state.