add option to exit runner if docker isn't available #3733

felixlut · 2025-03-05T19:52:13Z

This PR implements an option to exit the runner in case Docker is not available with RUNNER_WAIT_FOR_DOCKER_EXIT_ON_FAILURE. It already has an option to wait a set amount of time for Docker to become available (RUNNER_WAIT_FOR_DOCKER_IN_SECONDS), but if Docker is still not ready after that time the runner simply ignores it and trucks along. For my use-case that is not the desired behavior, and I'd rather the runner exit with an error instead.

I'd argue that a runner starting without Docker is faulty given the many GitHub Actions features depending on it (container jobs, Docker Container Actions, service containers, ...), or at the very least that it's an option to prevent it from starting in such a state. The runner already have a similar mechanism for sudo, so it's not a stretch to do the same here.

My use-case - Action Runner Controller in AKS

While running ARC in an AKS cluster I've noticed intermittent issues with starting the docker:dind sidecar container for new nodes during the first few minutes of a nodes lifecycle. The issue resolves itself given a couple of minutes, but not before causing issues due to the initial set of started runners that runs without Docker, resulting in crashes for workflows depending on it. I'd rather have the runner exit with an error, which in the Kubernetes world would mean a retry of the pod which (eventually) resolves the issue. This is the timeline of events as of now:

New node starting up
New runner starting on the new node
1. Error starting docker:dind. It is not retried
2. Runner waits for Docker for RUNNER_WAIT_FOR_DOCKER_IN_SECONDS seconds, but when the timer runs out it continues without it
Workflows depending on Docker start crashing (container job, Docker actions, ...)

Notably I've also tried bumping the RUNNER_WAIT_FOR_DOCKER_IN_SECONDS to a higher number, but the creation of the docker:dind container is not retried automatically, meaning that once a runner has encountered this error in will eventually start without Docker available. It might be possible to configure AKS to do the retry, but in either case I believe it should be a supported use-case to simply kill the runner if it's faulty.

… of true

add option to exit runner if docker isn't available

5efed79

felixlut requested a review from a team as a code owner March 5, 2025 19:52

felixlut marked this pull request as draft March 5, 2025 19:54

flesh out logging logic

4fb7140

felixlut marked this pull request as ready for review March 6, 2025 08:09

check if RUNNER_WAIT_FOR_DOCKER_EXIT_ON_FAILURE is set or not instead…

1d6707e

… of true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add option to exit runner if docker isn't available #3733

add option to exit runner if docker isn't available #3733

felixlut commented Mar 5, 2025 •

edited

Loading

add option to exit runner if docker isn't available #3733

Are you sure you want to change the base?

add option to exit runner if docker isn't available #3733

Conversation

felixlut commented Mar 5, 2025 • edited Loading

My use-case - Action Runner Controller in AKS

felixlut commented Mar 5, 2025 •

edited

Loading