2015-02-01

/bin/sh: Writing Your Own watch Command

The command watch in FreeBSD has a completely different function than the popular GNU-command with the same name. Since I find the GNU-watch convenient I wrote a short shell-script to provide that functionality for my systems. The script is a nice way to show off some basics as well as some advanced shell-scripting features.

To resolve the ambiguity with watch(8) I called it observe on my system. My observe command takes the time to wait between updates as the first argument. Successive arguments are interpreted as commands to run. The following listing is the complete code:

#!/bin/sh
set -f
sleep=$1
clear=
shift

runcmd() {
        tput cm 0 0
        (eval "$@")
        tput AL `tput li`
}

trap 'runcmd "$@"; tput ve; exit' EXIT INT TERM
trap 'clear=1' HUP INFO WINCH

tput vi
clear
runcmd "$@"
while sleep $sleep; do
        eval ${clear:+clear;clear=}
        runcmd "$@"
done

Careful observers may notice that there is no parameter checking and the code is not commented. These shortcomings are part of what makes it a convenient example in a tutorial.

Turning Off Glob-Pattern Expansion

The second line already shows a good convention:

#!/bin/sh
set -f

The set builtin can be used to set parameters as if they were provided on the command line. It is also able to turn them off again, e.g. set +x would turn off tracing. The -f option turns off glob pattern expansion for command arguments. This is a good habit to pick up, glob pattern expansion is very dangerous in scripts. Of course the -f option could be set as part of the shebang, e.g. #!/bin/sh -f, but that would allow the script user to override it. By canlling bash ./observe 2 ccache -s the shell could be invoked without setting the option, which is dangerous for options with safety-implications.

Global Variable Initialisation

The next block initialises some global variables:

sleep=$1
clear=
shift

Initialising global variables at the beginning of a script is not just good style (because there is one place to find them all), it also protects the script from whatever the caller put into the environment using export or the interactive shell's equivalent.

The shift builtin can be a very useful feature. It throws away the first argument, so what was $2 becomes $1, $3 turns into $2 etc.. With an optional argument the number of arguments to be removed can be specified.

The runcmd Function

The runcmd function is responsible for invoking the command in a fashion that overwrites its last output:

runcmd() {
        tput cm 0 0
        (eval "$@")
        tput AL `tput li`
}

The tput(1) command is handy to directly talk to the terminal. What it can do depends on the terminal it is run in, so it is good practice to test it in as many terminals as possible. A list of available commands is provided by the terminfo(5) manual page. The following commands were used here:

  • cm: cursor_address #row #col
    Used to position the cursor in the top-left corner
  • AL: parm_insert_line #lines
    Used to push any garbage on the terminal (e.g. random key inputs) out of the terminal
  • li: lines
    Returns the number of terminal lines on stdout

The tput AL `tput li` basically acts as a clear below the cursor command.

The eval "$@" command executes all the arguments (apart from the one that was shifted away) as shell commands. The command is enclosed by parenthesis to invoke it in a subshell. That effectively prevents it from affecting the script. It is not able to change signal handlers or variables of the script, because it is run in its own process.

Signal Handlers

Signal handlers provide a method of overriding the shell's default actions. The trap builtin takes the code to execute as the first argument, followed by a list of signals to catch. Providing a dash as the first argument can be used to invoke the default action:

trap 'runcmd "$@"; tput ve; exit' EXIT INT TERM
trap 'clear=1' HUP INFO WINCH

The INT signal represents a user interrupt, usually caused by the user pressing CTRL+C. The TERM signal is a request to terminate. E.g. it is sent when the system shuts down. The EXIT is a pseudosignal that occurs when the shell terminates regularly, i.e. by reaching the end of the script (in this case if sleep would fail) or an exit call.

The HUP signal is frequently used to reconfigure daemons without terminating them. WINCH occurs when the terminal is resized. The INFO signal is a very useful BSDism. It is usually invoked by pressing CTRL+T and causes a process to print status information.

The Output Cycle

The output cycle heavily interacts with the signal handlers:

tput vi
clear
runcmd "$@"
while sleep $sleep; do
        eval ${clear:+clear;clear=}
        runcmd "$@"
done

The tput vi command hides the cursor, tput ve turns it back on.

The clear command clears up the terminal before the command is run the first time.

The runcmd "$@" call occurs once before the loop, because the first call within the loop occurs after the first sleep interval.

The clear global is set by the HUP/WINCH/INFO handler. The eval ${clear:+clear;clear=} line runs the clear command if the variable is set and resets it afterwards. The clear command is not run every cycle, because it would cause flickering. The ability to trigger it is required to clean up the screen in case a command does not override all the characters from a previous cycle.

Conclusion

If you made it here, thank you for reading this till the end! You probably already knew a lot of what you read. But maybe you also learned a trick or two. That's what I hope.

2015-01-17

/bin/sh: Using Named Pipes to Talk to Your Main Process

Screenshot
A shell application with the main thread providing output and status display for the worker processes

You want to fork off a couple of subshells and have them talk back to your main Process? Then this post is for you.

What is a Named Pipe?

A named pipe is a pipe with a file system node. This allows arbitrary numbers of processes to read and write from the pipe. Which in turn makes multiple usage scenarios possible. his post just covers one of them, others may be covered in future posts.

The Shell

The following examples should work in any Bourne Shell clone, such as the Almquist Shell (/bin/sh on FreeBSD) or the Bourne-Again Shell (bash).

HowTo

The first step is to create a Named Pipe. This can be done with the mkfifo(1) command:

# Get a temporary file name
node="$(mktemp -u)" || exit
# Create a named pipe
mkfifo -m0600 "$node" || exit

Running that code should produce a Named Pipe in /tmp.

The next step is to open a file descriptor. In this example a file descriptor is used for reading and writing, this avoids a number of pitfalls like deadlocking the script:

# Attach the pipe to file descriptor 3
exec 3<> "$node"
# Remove file system node
rm "$node"

Note how the file system node of the named pipe is removed immediately after assigning a file descriptor. The exec 3<> "$node" command has opened a permanent file descriptor, which remains open until manually closed or until the process terminates. So deleting the file system node will cause the system to remove the Named Pipe as soon as the process terminates, even when it is terminated by a signal like SIGINT (user presses CTRL-C).

Forking and Writing into the Named Pipe

From this point on the subshells can be forked using the & operator:

# This function does something
do_something() {
    echo "do_something() to stdout"
    echo "do_something() to named pipe" >&3
}

# Fork do_something()
do_something &
# Fork do_something(), attach stdout to the named pipe
do_something >&3 &

# Fork inline
(
    echo "inline to pipe" >&3
) &
# Fork inline, attach stdout to the named pipe
(
    echo "inline to stdout"
) >&3 &

Whether output is redirected per command or for the entire subshell is a matter of personal taste. Either way the processes inherit the file descriptor to the Named Pipe. It is also possible to redirect stderr as well, or redirect it into a different named pipe.

The Named Pipe is buffered, so all the subshells can start writing into it immediately. Once the buffer is full, processes trying to write into the pipe will block, so sooner or later the data needs to be read from the pipe.

Reading from the Named Pipe

To read from the pipe the shell-builtin command read is used.

Using non-builtin commands like head(1) usually leads to problems, because they may read more data from a pipe than they output, causing it to be lost.

# Make sure white space does not get mangled by read (IFS only contains the newline character)
IFS='
'

# Blocking read, this will halt the process until data is available
read line <&3

# Non-blocking read that reads as much data as is currently available
line_count=0
lines=
while read -t0 line <&3; do
    line_count=$((line_count + 1))
    lines="$lines$line$IFS"
done

Using a blocking read causes the process to sleep until data is available. The process does not require any CPU time, the kernel takes care of waking the process.

That's all that is required to establish ongoing communication between your processes.

The direction of communication can be reversed to use the pipe as a job queue for forked processes. Or a second pipe can be used to establish 2-way communications. With just two processes a single pipe might suffice for two way communications. A named pipe can be connected to an ssh(1) session or nc(1).

Basically named pipes are a way to establish a pipe to background processes or completely independent processes, which do not even have to run on the same machine. So, happy hacking!