Sometimes you encounter weird problems, and drilling down reveals something weird and interesting which of course makes sense in retrospect. This is an account of one of them.
Some context: this is a backup helper script, in
bash. It uses
gpg to decrypt multiple backups in parallel, to take better advantage of modern multi-threaded processors—using GNU
parallel, in case you're curious. Because there are hundreds of files to decrypt, the scripts loads its own
gpg-agent so the operator won't need to enter a long passphrase multiple times.
The problem itself can be demonstrated with a minuscule subset of this script. Call this script
#!/bin/bash killall -9 gpg-agent # We always start a new agent. exec 4>&1 # Simulate script logging gpg-agent --daemon echo 'Done!'
Now, if you run
foo.sh on its own:
$ ./foo.sh + exec + gpg-agent --daemon + echo 'Done!' Done! $ █
it works fine: you get back to the shell prompt, and if you check the process table, you'll find
Now try piping the output to something like
$ ./foo.sh | tee foo.log + exec + gpg-agent --daemon + echo 'Done!' Done!
Where's the prompt gone? The first person who encountered this problem discovered two fixes empirically:
- Breaking out of the script with Ctrl-C.
- Switching to another session and invoking the script's clean-up mode, which kills
In both cases, the script ended and the user got the shell prompt back. Even without those hints, and given I'm probably half as clever as you, you've already spotted the problem.
gpg-agent is somehow holding
stdout open, and since
tee's standard input is still open, it's still waiting for the script's output.
There are a number of ways to diagnose this issue:
lsof -c gpg-agentto see what file descriptors the agent has open.
- On Linux, take a gander inside
PIDis the process ID of
straceto see what the agent is doing on start-up.
- Check out the source.
My immediate instinct (knowing the script intimately) was to check inside
/proc/ for file descriptor 4. In there, I found something like this:
$ sudo ls -la /proc/1696/fd total 0 dr-x------ 2 alexios alexios 0 Mar 5 02:58 . dr-xr-xr-x 8 alexios alexios 0 Mar 5 02:57 .. lr-x------ 1 alexios alexios 64 Mar 5 02:58 0 -> /dev/null l-wx------ 1 alexios alexios 64 Mar 5 02:58 1 -> /dev/null l-wx------ 1 alexios alexios 64 Mar 5 02:58 2 -> /dev/null lrwx------ 1 alexios alexios 64 Mar 5 02:58 3 -> socket: l-wx------ 1 alexios alexios 64 Mar 5 02:58 4 -> pipe: lrwx------ 1 alexios alexios 64 Mar 5 02:58 5 -> socket: lrwx------ 1 alexios alexios 64 Mar 5 02:58 6 -> socket: lrwx------ 1 alexios alexios 64 Mar 5 02:58 7 -> socket: lr-x------ 1 alexios alexios 64 Mar 5 02:58 8 -> inotify lr-x------ 1 alexios alexios 64 Mar 5 02:58 9 -> inotify $ █
Note that the dæmon has closed standard input, output and error (FDs 0, 1 and 2). However, FD 4 is still open, and even allocated to a pipe. Obviously. this daemon is misbehaving somehow.
The answer lies in what FD 4 is: the original script uses a library that provides logging facilities to standard output, standard error, or a log file. By default, log output goes to standard output, file descriptor 1, so the library executes this when the script starts up:
This creates FD 4, and redirects it to FD 1 (the standard output). The script in question doesn't use a log file, so FD 4 remains at the default state.
When a dæmon starts up, it's customary to close all file descriptors and start a new session using the
setsid() system call. This detaches it from the terminal and makes PID 1 (
init) become its parent. However,
gpg-agent needs the terminal so its ancillary ‘PIN’ entry programs can ask the user for a passphrase (there are GUI PIN entry programs too, but
gpg-agent keeps the terminal open regardless). So, it doesn't close all file descriptors.
Why pipe the output to a log file if you expect to enter a passphrase?I'm not sure! You'll have to ask the script's end user for that. Personally, I'd run an interactive script inside a
tmux(1)with logging enabled, or just use
script(1). I'm pretty sure a curses-based program would still work, too. But there's no accounting for taste, and maybe they were using the text-only PIN entry helper.
Solving the problem at the source is trivial, once we've found the root of the problem. We modify the script so it no longer passes FD 4 on to
gpg-agent --daemon 4>/dev/null
And then it works like a charm! So, what can we learn from this?
- There's no such thing as ‘one size fits all’! Even a thoroughly tested library can exhibit surprising behaviours.
- Not all dæmons are created equal. The traditional Unix runtime model might not be a perfect fit for all workflows.
- Know your diagnostic tools!
/procis a little unfashionable these days, decades after it was first introduced to Linux, but never underestimate how useful it can be!
- There's no substitute for knowing the fundaments of how Unix works, even if you're diagnosing user-space issues.
And most importantly: even a relatively small, easily solved issue like this can be a learning experience!