Broken composition or the tale of bash and set -e

Today I will talk about a surprising behavior in bash, that may cause issues by hiding bugs.

Bash has rather simple error checking support. There are, in general, two ways one can approach error checking. The first one is entirely unrealistic, the second one has rather poor user experience.

The first way to handle errors is to wrap, every single thing in and if-then-else statement, provide a tailored error message, perform cleanup and quit. Nobody is doing that. Shell scripts, it's sloppy more often than not.

The second way to handle errors is to let bash do it for you, by setting te errexit option, by using set -e. To illustrate this, look at the following shell program:

set -e

echo "I'm doing stuff"
echo "I'm done doing stuff"

As one may expect,false will be the last executed command.

Sadly, things are not always what one would expect. For good reasons set -e is ignored in certain contexts. Consider the following program:

set -e
if ! false; then
   echo "not false is true"

Again, as one would expect, the program executes in its entirety. Execution does not stop immediately at false, as that would prevent anyone from using set -e in any non-trivial script.

This behavior is documented by the bash manual page:

Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. A trap on ERR, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.

The set of situations where set -e is ignored is much larger. It contains things like pipelines, commands involving && and ||, except for the final element. It is also ignored when the exit status is inverted with !, making something as innocent as ! true silently ignore the exit status.

What may arguably need more emphasizing, is this:

If a compound command or shell function executes in a context where -e is being ignored, none of the commands executed within the compound command or function body will be affected by the -e setting, even if -e is set and a command returns a failure status. If a compound command or shell function sets -e while executing in a context where -e is ignored, that setting will not have any effect until the compound command or the command containing the function call completes.

What this really says is that set -e is disabled for the enitre duration of a compound command. This is huge, as it means that code which looks perfectly fine and works perfectly fine in isolation, is automatically broken by other code, which also looks and behaves perfectly fine in isolation.

Consider this function:

foo() {
    set -e # for extra sanity
    echo "doing stuff"
    echo "finished doing stuff"

This function looks correct and is correct in isolation. Lets carry on and look at another function:

bar() {
    set -e # for extra sanity
    if ! foo; then
        echo "foo has failed, alert!"
        return 1

This also looks correct. In fact, it is correct as well, as long as foo is an external program. If foo is a function, like what we defined above. The outcome is that foo executes all the way to the end, ignoring set -e's normal effect after the non-zero exit code from false. Unlike when invoked in isolation, foo prints the finished doing stuff message. What is worse, because echo succeeds, foo doesn't fail!

Bash breaks composition of correct code. Two correct functions stop being correct, if the correctness was based on the assumption, that set -e stops execution of a list of commands, after the first failing element of that list.

It's a documented feature, so it's hardly something one can report as a bug on bash. One can argue that shellcheck should warn about that. I will file a bug on shellcheck, but discovering this has strongly weakened my trust in using bash for anything other than an isolated, executed script. Using functions or sourcing other scripts is a huge risk, as the actual semantics is not what one would expect.

You'll only receive email when Zygmunt Krynicki publishes a new post

More from Zygmunt Krynicki