Subshell conditional command exit status handling

Tuesday 21 January 2025 05:49 PM (Dhaka)

Very recently, while deploying some changes via GitHub Actions, the pipeline was failing with this ever-cryptic error message:

Error: Process completed with exit code 1.

Nothing more! Not a single hint! Which essentially meant that I would need to get down the rabbit hole myself. And I did!

The error was coming from a pipeline job step that executes a command via ssh. The command is quite long but for simplicity the very minimal-reproducible version would be (ignoring all ssh client options as well):

ssh some-server 'cd some-dir/ && for dir in */; do ( cd "$dir" && [ -f some-file ] && echo "found" ); done'

Essentially, after establishing the ssh connection to some-server, we'd cd into some-dir. Then, for each subdirectory inside some-dir, in a subshell, we'd cd into the subdirectory, test ([) for the existence of some-file and if found, echo the string found.

We're running the whole command in the subshell in a short-circuit fashion with logical-and && i.e. if anything in the chain fails (exits with non-zero exit status), the next command is not executed. That's the source of our problem as well -- as the exit status of the whole subshell is that of the last executed command so for example, if no file named some-file is present in subdirectory X, the [ -f some-file] test would be the last command executed (echo "found" would not run as the test failed) in the subshell for subdirectory X and as [-f some-file] would have a exit status of 1, the whole subshell would have the exit status of 1 as well.

Another very important thing to consider is the exit status of the whole ssh command. In the above case, the exit status of ssh would be the exit status of the last subshell processed i.e. the last command run, which in turn depends on how the shell (bash in this case) sorts subdirectory names from */. As we're running a for loop over the glob expansion of */, the order shell gives us, we iterate over them in that exact order.

For example, let's take the following directory hierarchy:

$ tree -d some-dir
some-dir
├── X
├── Y
└── Z

the exit status of the whole ssh command would be 0 (successful) if the directory Z contains file named some-file (Z comes at the end in */ on some-dir).

Let's add echo "$dir" to get the sorting order in bash for the glob token */ (ignoring the ssh command here as the exit status of the following would be reflected in ssh as-is):

$ cd some-dir/ && for dir in */; do ( cd "$dir" && echo "$dir" ); done
X/
Y/
Z/

Now, if the directory Z doesn't contain some-file, we'd get a exit status of 1 (unsuccessful):

% tree some-dir
some-dir
├── X
│   └── some-file
├── Y
│   └── some-file
└── Z

$ cd some-dir/ && for dir in */; do ( cd "$dir" && [ -f some-file ] && echo "found in ${dir}" ); done
found in X/
found in Y/

$ echo $?
1

But if the file exists on Z but not on X or Y, it would give us an exit status of 0 (successful):

% tree some-dir
some-dir
├── X
├── Y
│   └── some-file
└── Z
    └── some-file

$ cd some-dir/ && for dir in */; do ( cd "$dir" && [ -f some-file ] && echo "found in ${dir}" ); done
found in Y/
found in Z/

$ echo $?
0

Going back to our pipeline ssh command, if the last directory from */ expansion contains file named some-file, we'd get an exit status of 0 (successful) for the whole ssh command, 1 (unsuccessful) otherwise. The basic idea of pipeline jobs is that if some job step's command fails with non-zero exit status, nothing following that step would run and the whole job would be marked as failed.

The non-existent file some-file is not an issue in this case to warrant the failure of the whole job, so the solution would be to do something to ignore the exit-status of [ -f some-file ], and if there are other commands that do the same need to be ignored as well.

As I didn't want to write a whole bunch of if-else in this case and keep the short-circuit for readability, as a solution, I used a nested subshell approach -- I invoked a wrapper subshell that contains the above subshell and a || true just in case any command in the first subshell returns non-zero exit status:

cd some-dir/ && for dir in */; do ( ( cd "$dir" && [ -f some-file ] && echo "found" ) || true ); done

Same thing through ssh:

ssh some-server 'cd some-dir/ && for dir in */; do ( ( cd "$dir" && [ -f some-file ] && echo "found" ) || true ); done'

Notes:

The biggest caveat to do something like this is that it would hide any error (i.e. unsuccessful command with non-zero exit status) on the inner subshell. In my case, it doesn't matter as I was expecting some directories to not contain the some-file so the [ -f some-file ] is not problematic. But this is something to keep in mind.
The ordering in glob or pathname expansion pattern * depends on the collation setting in locale i.e. LC_COLLATE. locale command can be used to get the current value of all the locale-specific values including LC_COLLATE. My system has en_US.UTF-8 as the value for LC_COLLATE so the above sorting order is based on that locale; on a different locale, the sorting order could/would be different.
There is also a bash keyword [[ which is a conditional construct and is a superset of [/test builtin. [[ would behave the same way as [/test in this case.
The above assumes bash as the shell.

References:

Readul Hasan Chayan [Heemayl]

Comments