2014-04-01

geli suspend/resume with Fulll Disk Encryption

This article has been updated 2014-04-01. Changes are marked with this background colour.

This article details my solution of the geli resume deadlock. It is the result of much fiddling and locking myself out of the file system.

The presented solution works most of the time, but it is still possible to lock up the system so far that VT-switching is no longer possible.

After my good old HP6510b notebook was stolen I decided to set up full disk encryption for its replacement. However after I set it up I faced the problem that the device would be wide open after resuming from suspend. That said I rarely reboot my system, I usually keep everything open permanently and suspend the laptop for transport or extended non-use. So the problem is quite severe.

Luckily the FreeBSD encryption solution geli(8) provides a mechanism called geli suspend that deletes the key from memory and stalls all processes trying to access the file system. Unfortunately geli resume would be one such process.

The System

So first things first, a quick overview of the system. If you ever set up full disk encryption yourself, you can probably skip ahead.

The boot partition containing the boot configuration, the kernel and its modules is not encrypted. It resides in the device ada0p2 labelled gpt/6boot. The encrypted device is ada0p4 labelled 6root. For easy maintenance and use the 6boot:/boot directory is mounted into 6root.eli:/boot (the .eli marks an attached encrypted device). Because /boot is a subdirectory in the 6boot file system, a nullfs(5) mount is required to access 6boot:/boot and mount it into 6root:/boot. To access 6boot:/boot, 6boot is mounted into /mnt/boot.

Usually mount automatically loads the required modules when invoked, but this doesn't work when the root file system doesn't contain them. So the required modules need to be loaded during the loader stage.

/boot/loader.conf
# Encrypted root file systeme
vfs.root.mountfrom="ufs:gpt/6root.eli"
geom_eli_load="YES"                     # FS crypto
aesni_load="YES"                        # Hardware AES

# Allow nullfs mounting /boot
nullfs_load="YES"
tmpfs_load="YES"
/etc/fstab
# Device           Mountpoint   FStype Options    Dump Pass
/dev/gpt/6root.eli /            ufs    rw,noatime 1    1
/dev/gpt/6boot     /mnt/boot    ufs    rw,noatime 1    1
/mnt/boot/boot     /boot        nullfs rw         0    0
/dev/gpt/6swap.eli none         swap   sw         0    0
# Temporary files
tmpfs              /tmp         tmpfs  rw         0    0
tmpfs              /var/run     tmpfs  rw         0    0

The Problem

The problem with geli suspend/resume is that calling geli resume ada0p4 deadlocks, because geli is located on the partition that is supposed to be resumed.

The Approach

The solution is quite simple. Put geli somewhere unencrypted.

To implement this several challenges need to be faced:

ChallengeApproach
ProgrammingShell-scripting
Technology, avoiding file system accessUse tmpfs(5)
Usability, how to enter passphrasesUse a system console
Safety, the solution needs to be running before a suspendUse an always on, unauthenticated console
Security, an unauthenticated interactive service is prone to abuseOnly allow password entry, no other kinds of interactive control
Safety, what about accidentally terminating the scriptIgnore SIGINT

The Script

The complete script can be found at the bottom.

Constants

At the beginning of the script some read-only variables (the closest available thing to constants) are defined, mostly for convenience and to avoid typos.

#!/bin/sh
set -f

readonly gcdir="/tmp/geliconsole"
readonly dyn="/sbin/geli;/usr/bin/grep;/bin/sleep;/usr/sbin/acpiconf"
readonly static="/rescue/sh"
Bootstrapping

The script is divided into two parts, the first part is the bootstrapping section that requires file system access and creates the tmpfs with everything that is needed to resume suspended partitions.

The bootstrap is performed in a conditional block, that checks whether the script is runnig from gcdir. It ends with calling a copy of the script. The exec call means the bootstrapping process is replaced with the new call. The copy of the script will detect that it is running from the tmpfs and skip the bootstrapping:

# If this process isn't running from the tmpfs, bootstrap
if [ "${0#${gcdir}}" == "$0" ]; then
 …
 # Complete bootstrap
 exec "${gcdir}/sh" "${gcdir}/${0##*/}" "$@"
fi

Before completing the bootstrap, the tmpfs needs to be set up. Creating it is a good start:

# Create tmpfs
/bin/mkdir -p "${gcdir}"
/sbin/mount -t tmpfs tmpfs "$gcdir" || exit 1

# Copy the script before changing into gcdir, $0 might be a
# relative path
/bin/cp "$0" "${gcdir}/" || exit 1

# Enter tmpfs
cd "${gcdir}" || exit 1

The next step is to populate it with everything that is needed. I.e. all binaries required after performing the bootstrap. Two kinds of binaries are used, statically linked (see the static read-only) and dynamically linked (see the dyn read-only).

The static binaries can simply be copied into the tmpfs, the dynamically linked ones also require libraries, a list of which is provided by ldd(1).

Note the use of IFS (Input Field Separator) to split variables into multiple arguments and how subprocesses are used to limit the scope of IFS changes.

# Get shared objects
(IFS='
'
 for lib in $(IFS=';';/usr/bin/ldd -f '%p;%o\n' ${dyn}); do
  (IFS=';' ; /bin/cp ${lib})
 done
)

# Get executables
(IFS=';' ; /bin/cp ${dyn} ${static} "${gcdir}/")

The resulting tmpfs contains the binaries sh, geli, sleep, grep, acpiconf and all required libraries.

Interactive Stage

When reaching the interactive stage, the script is already run by a static shell within the tmpfs. The first order of business is to make sure the shell won't look for executables outside the tmpfs:

export PATH="./"

The next step is to trap some signals to make sure the script exits gracefully and is not terminated by pressing CTRL+C:

 
trap 'echo geliconsole: Exiting' EXIT
trap "/sbin/umount -f '${gcdir}' ; exit 0" SIGTERM
trap '' SIGINT SIGHUP

The last stage is a while-true loop that checks for suspended partitions and calls geli resume.

echo "geliconsole: Activated"
while :; do
 if geli list | grep -qFx 'State: SUSPENDED'; then
  geom="$(geli list | grep -FxB1 'State: SUSPENDED')"
  geom="${geom#Geom name: }"
  geom="${geom%%.eli*}"
  echo "geliconsole: Resume $geom"
  geli resume "$geom"
  echo .
 else
  sleep 2
 fi
done

The System Console

Because the script does not take care of grabbing the right console, it cannot simply be run from /etc/ttys. Instead it needs to be started by getty(8). To do this a new entry into /etc/gettytab is required:

#
# geliconsole
#
geliconsole|gc.9600:\
 :al=root:tc=std.9600:lo=/root/bin/geliconsole:

The entry defines a new terminal type called geliconsole with auto login.

The new terminal can now be started by the init(8) process by adding the following line to /etc/ttys:

ttyvb "/usr/libexec/getty geliconsole" xterm on  secure

With kill -HUP 1 the init process can be notified of the change.

The console should now be available on terminal 11 (CTRL+ALT+F12) and look similar to this:

FreeBSD/amd64 (AprilRyan.norad) (ttyvb)

geliconsole: Activated

Suspending

In order to automatically suspend disks update /etc/rc.suspend:

…
/usr/bin/logger -t $subsystem suspend at `/bin/date +'%Y%m%d %H:%M:%S'`
/bin/sync && /bin/sync && /bin/sync
/bin/rm -f /var/run/rc.suspend.pid
/usr/sbin/vidcontrol -s 12 < /dev/ttyv0 > /dev/ttyv0
/sbin/geli suspend -a
# The following delay may be reduced, by how much depends on the system, I am using a 1 second delay.
/tmp/geliconsole/sleep 3

if [ $subsystem = "apm" ]; then
 /usr/sbin/zzz
else
 # Notify the kernel to continue the suspend process
 /tmp/geliconsole/acpiconf -k 0
fi

exit 0

The vidcontrol command VT-switches to the geli console, before the geli command suspends all encrypted partitions. They can be recovered by pressing CTRL+ALT+F12 to enter the console and entering the passphrase there.

In order for the VT-switch to work without flaw, the automatic VT switch to console 0 needs to be turned off:

# sysctl hw.syscons.sc_no_suspend_vtswitch=1
# echo hw.syscons.sc_no_suspend_vtswitch=1 >> /etc/sysctl.conf

Desirable Improvements

For people running X, especially with a version where X breaks the console (like is currently the case with KMS support), it would be nice to enter the keywords through a screen locker.

Also it is not really necessary to run the script with root privileges. A dedicated, less privileged user account, should be created and used.

Files

/root/bin/geliconsole
#!/bin/sh
set -f

readonly gcdir="/tmp/geliconsole"
readonly dyn="/sbin/geli;/usr/bin/grep;/bin/sleep;/usr/sbin/acpiconf"
readonly static="/rescue/sh"

# If this process isn't running from the tmpfs, bootstrap
if [ "${0#${gcdir}}" == "$0" ]; then
 # Create tmpfs
 /bin/mkdir -p "${gcdir}"
 /sbin/mount -t tmpfs tmpfs "$gcdir" || exit 1

 # Copy the script before changing into gcdir, $0 might be a
 # relative path
 /bin/cp "$0" "${gcdir}/" || exit 1

 # Enter tmpfs
 cd "${gcdir}" || exit 1

 # Get shared objects
 (IFS='
'
  for lib in $(IFS=';';/usr/bin/ldd -f '%p;%o\n' ${dyn}); do
   (IFS=';' ; /bin/cp ${lib})
  done
 )

 # Get executables
 (IFS=';' ; /bin/cp ${dyn} ${static} "${gcdir}/")

 # Complete bootstrap
 exec "${gcdir}/sh" "${gcdir}/${0##*/}" "$@"
fi

export PATH="./"
 
trap 'echo geliconsole: Exiting' EXIT
trap "/sbin/umount -f '${gcdir}' ; exit 0" SIGTERM
trap '' SIGINT SIGHUP

echo "geliconsole: Activated"
while :; do
 if geli list | grep -qFx 'State: SUSPENDED'; then
  geom="$(geli list | grep -FxB1 'State: SUSPENDED')"
  geom="${geom#Geom name: }"
  geom="${geom%%.eli*}"
  echo "geliconsole: Resume $geom"
  geli resume "$geom"
  echo .
 else
  sleep 2
 fi
done

No comments:

Post a Comment