What is "Graceful Shutdown"
"Graceful shutdown" refers to ensuring that our application shuts down calmly, without causing any disruptions, data loss, or loose ends, akin to that coworker who ends a call without saying goodbye.
Applications don't last forever in the real world. They are scaled, restarted, updated, or just shut down. If this moment is not managed well, a number of issues may occur:
- Data loss: continued operations are stopped suddenly;
- Locked resources: database connections, queues, or sockets hang;
- Compromised user experience: unexpected errors, timeouts, or broken responses;
- Unpredictable behavior: dependent services may fail in a cascade.
The process of ending an application in a controlled way while adhering to certain guidelines is known as "graceful shutdown":
- Signal the time for component shutdown (background services, open connections, handlers, etc.);
- Wait for them to finish what they're doing (with a reasonable timeout!);
- Release resources (write logs, flush buffers, and close connections);
- Finish the process only when everything is neat and orderly.
In a modern world with microservices, containers and orchestration in Kubernetes, this process becomes even more critical: a misbehaving application can block the deployment cycle, break dependencies, and complicate support.
💡 Cloud-native best practice note: Ideally, modern applications should be stateless and short-lived - meaning, don't keep state in memory or run operations for half an hour. This greatly simplifies shutdown and improves system resilience as a whole. But... when long processes are inevitable (like ETLs, heavy uploads, or critical operations), it's best to use patterns like:
- Sagas: to manage long workflows with compensations in case of failure;
- Checkpointing and Retryables: to resume from where it left off;
- Idempotency: to ensure repeating an operation doesn't cause side effects.
Fundamental Concepts
Although the focus of this article is on .NET applications, some of the concepts we will discuss here are language-agnostic and can be applied to any contemporary application that runs on a real operating system, particularly Linux servers (or Linux-based containers).
Before we see what .NET offers us, it's important to understand some core concepts - starting with one that comes from the operating system itself.
POSIX Signals
The first thing to understand is what POSIX signals are (also known as Unix signals). When a process runs on a Unix/Linux system (like most cloud production environments), the OS can send it special signals - literally "messages" that say:
"Hey, it's time to do something!"
These signals are triggered by events such as:
- The user pressing
Ctrl+C
in a terminal; - A manual
kill
command (likekill -15 <pid>
,kill -9 <pid>
,kill -3 <pid>
orkill -2 <pid>
); - The operating system deciding to terminate a process;
- A container being shut down (e.g.,
kubectl delete pod
in Kubernetes).
The most relevant signals for us are:
Signal | Code | Description |
---|---|---|
SIGINT | 2 | Process interruption. Triggered by pressing Ctrl+C in terminal. |
SIGTERM | 15 | Polite termination request. Used by docker stop , kill -15 <pid> , Kubernetes. |
SIGQUIT | 3 | Generates core dump and ends process. Useful for debugging. |
SIGKILL | 9 | Forced termination. Used by kill -9 <pid> . Cannot be caught or ignored. Last resort. |
SIGINT
: the dev signal
SIGINT is the most common signal during development time. It's emitted when you press Ctrl+C
in the terminal where the application is running or execute the command kill -2 <pid>
. It serves to manually interrupt the process, and often the application reacts in the same way as it would to SIGTERM
, initiating a graceful shutdown.
It's ideal for testing graceful shutdown locally.
SIGTERM
: our production hero
This is the most important signal in a production context. SIGTERM is what Kubernetes sends by default when terminating a Pod, or what is used by commands like docker stop
and kill -15 <pid>
. If the application is properly prepared, it can release resources and shut down gracefully.
SIGQUIT
: the debugging grenade
SIGQUIT terminates the process and generates a core dump - a complete snapshot of the application's memory at the moment it crashed. It's particularly useful for debugging more complex scenarios in production or during stress testing. While less commonly used directly, it's good practice to register a handler for it if you want the opportunity to log extra diagnostics before the process exits.
SIGKILL
: the nuclear button
SIGKILL, the end of the line. This signal cannot be caught or ignored. The process is killed immediately. Classic command: kill -9 <pid>
. There's no opportunity for cleanup, no handlers. Nothing. That's why it's essential for the application to respond quickly to SIGTERM
before the system (or Kubernetes) loses patience and sends a SIGKILL
.
Example
using System.Runtime.InteropServices;
var waitForExit = new ManualResetEventSlim(false);
PosixSignalRegistration.Create(PosixSignal.SIGINT, context =>
{
Console.WriteLine("SIGINT received. Exiting...");
waitForExit.Set();
});
PosixSignalRegistration.Create(PosixSignal.SIGTERM, context =>
{
Console.WriteLine("SIGTERM received. Exiting...");
waitForExit.Set();
});
PosixSignalRegistration.Create(PosixSignal.SIGQUIT, context =>
{
Console.WriteLine("SIGQUIT received. Exiting...");
waitForExit.Set();
});
Console.WriteLine("Application running. Send SIGINT, SIGTERM or SIGQUIT to exit.");
// Wait indefinitely until one of the signal handlers releases the wait
waitForExit.Wait();
Console.WriteLine("Application shutting down gracefully...");
CancellationToken
The other essential concept we need to understand - now more specific to the .NET world, is the so-called CancellationToken
.
The CancellationToken
serves as a controlled way to cancel ongoing operations. It's an object that the application can pass to asynchronous or long-running tasks, such as:
- HTTP calls,
- stream reading,
- delays or timeouts,
- background processing.
When the token is "triggered" - which can happen, for example, when an HTTP call is cancelled, when we manually use a CancellationTokenSource
, or during host shutdown - it fires a signal to all operations that are listening to it. These operations, in turn, should stop what they are doing in a clean and coordinated way.
💡 Important note:
The CancellationToken
does not forcefully interrupt a process.
It only serves to indicate that a cancellation request was made.
It's up to the code to regularly check the token's status (IsCancellationRequested
) and decide to stop execution - usually at the next loop iteration, or before starting new operations.
Cancellation workflow
Start of process
|
v
[ Check CancellationToken ]
|
v
[ Execute operation A ]
|
v
[ Check CancellationToken ]
|
v
[ Execute operation B ]
|
v
--- CancellationToken is signaled (e.g., SIGTERM received) ---
|
v
[ Operation B continues until it finishes ]
|
v
[ Check CancellationToken again ]
|
v
[ Detect cancellation -> break loop ]
|
v
[ Release resources and finish process ]
|
v
End
Example
public class Worker(ILogger<Worker> logger) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while(!stoppingToken.IsCancellationRequested)
{
logger.LogInformation("Worker running at: {time}", DateTimeOffset.UtcNow);
await Task.Delay(2_000, stoppingToken);
}
}
}
And why is this important for graceful shutdown?
Imagine you have an endpoint processing a purchase, or a HostedService
handling messages from a queue. If the application receives a SIGTERM
or is being shut down, we want to:
- Notify those operations that they need to stop (without turning things off abruptly);
- Give them time to finish what they were doing or, if possible, cancel safely;
- Avoid corrupting data or leaving the system in an inconsistent state.
This is where the CancellationToken
shines: it synchronizes the host's lifecycle with the application's components.
Important Components in .NET
Ensuring that the application shuts down in a controlled way, releasing resources and finishing pending operations safely, is essential for robust and reliable applications. .NET provides specific mechanisms to manage this process, using interfaces like IHostedLifecycleService
, IHostApplicationLifetime
, IHostLifetime
and classes such as ConsoleLifetime
.
IHostedLifecycleService
Interface that provides additional methods to explicitly react before and after the standard lifecycle operations (StartAsync and StopAsync) of hosted services (IHostedService
).
public interface IHostedLifecycleService
{
Task StartingAsync(CancellationToken cancellationToken);
Task StartedAsync(CancellationToken cancellationToken);
Task StoppingAsync(CancellationToken cancellationToken);
Task StoppedAsync(CancellationToken cancellationToken);
}
This interface allows performing custom tasks immediately before and after the start and stop of hosted services, ensuring more granular management of the application lifecycle.
public class Worker(ILogger<Worker> logger) : BackgroundService, IHostedLifecycleService
{
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while(!stoppingToken.IsCancellationRequested)
{
logger.LogInformation("Worker running at: {time}", DateTimeOffset.UtcNow);
await Task.Delay(500, stoppingToken);
}
logger.LogInformation("Worker stopped running at: {time}", DateTimeOffset.UtcNow);
}
public async Task StartingAsync(CancellationToken cancellationToken)
{
logger.LogInformation("StartingAsync called - Before delay");
await Task.Delay(3_000, cancellationToken);
logger.LogInformation("StartingAsync called - After delay");
}
public Task StartedAsync(CancellationToken cancellationToken)
{
logger.LogInformation("StartedAsync called");
return Task.CompletedTask;
}
public async Task StoppingAsync(CancellationToken cancellationToken)
{
logger.LogInformation("StoppingAsync called - Before delay");
await Task.Delay(3_000, cancellationToken);
logger.LogInformation("StoppingAsync called - After delay");
}
public Task StoppedAsync(CancellationToken cancellationToken)
{
logger.LogInformation("StoppedAsync called");
return Task.CompletedTask;
}
}
IHostApplicationLifetime
Provides notifications for application lifecycle events like ApplicationStarted
, ApplicationStopping
, and ApplicationStopped
.
public interface IHostApplicationLifetime
{
CancellationToken ApplicationStarted { get; }
CancellationToken ApplicationStopping { get; }
CancellationToken ApplicationStopped { get; }
void StopApplication();
}
It is used to react to specific application lifecycle events, allowing internal services to release resources or complete pending tasks before the application fully shuts down.
var appLifetime = app.Services.GetRequiredService<IHostApplicationLifetime>();
appLifetime.ApplicationStarted.Register(() => Console.WriteLine("Application has started"));
appLifetime.ApplicationStopping.Register(() => Console.WriteLine("Application is stopping"));
appLifetime.ApplicationStopped.Register(() => Console.WriteLine("Application has stopped"));
IHostLifetime
This interface specifically manages how the host responds to the environment in which it is running (e.g., console, Windows Service, containers).
public interface IHostLifetime
{
Task WaitForStartAsync(CancellationToken cancellationToken);
Task StopAsync(CancellationToken cancellationToken);
}
It allows controlling the host's behavior concerning the runtime environment, handling, for example, external signals like SIGTERM
or SIGINT
.
ConsoleLifetime
A specific implementation of IHostLifetime
that reacts to events like SIGINT
or SIGTERM
signals. Responsible for handling shutdown through console commands or external events, triggering the graceful shutdown process. This interface is already implemented by default in .NET applications.
HostOptions.ShutdownTimeout
HostOptions
allows configuring fundamental behaviors of the application's lifecycle, including the wait time during graceful shutdown. The ShutdownTimeout
property defines the maximum amount of time the host will wait for the completion of the StopAsync
method and the StoppingAsync
/StoppedAsync
methods of hosted services (IHostedLifecycleService
).
builder.Services.Configure<HostOptions>(options =>
options.ShutdownTimeout = TimeSpan.FromSeconds(15));
What does this line do? It overrides the host's default wait time (30 seconds) and sets it to 15 seconds. This means that after receiving a signal like SIGTERM
, .NET will give up to 15 seconds for services to finish what they're doing before forcing the application to stop.
Life cycle workflow
Special Cases
BackgroundService and CancellationToken
By default, services derived from the BackgroundService
class are already prepared to shut down properly when a stop request is received. This happens because they use a CancellationToken
directly tied to the host's shutdown process, allowing the application to automatically and safely wait for the completion of running tasks.
Adding delay to shutdown (Custom IHostLifetime)
In environments like Kubernetes, it may be necessary to implement an additional delay before the application's full shutdown. This is especially relevant when ingress traffic and the Kubernetes control plane operate independently. A practical example of this approach is described here, where an intentional delay is introduced before total shutdown, allowing pending operations to be safely completed.
builder.Services.AddSingleton<IHostLifetime, GracefulShutdown>();
public class GracefulShutdown(
ILogger<GracefulShutdown> logger,
IHostApplicationLifetime applicationLifetime) : IHostLifetime, IDisposable
{
private readonly ILogger<GracefulShutdown> _logger = logger;
private readonly IHostApplicationLifetime _applicationLifetime = applicationLifetime;
private readonly TimeSpan _delay = TimeSpan.FromSeconds(3);
private IEnumerable<IDisposable>? _disposables;
public Task WaitForStartAsync(CancellationToken cancellationToken)
{
_disposables =
[
PosixSignalRegistration.Create(PosixSignal.SIGINT, _handleSignal),
PosixSignalRegistration.Create(PosixSignal.SIGQUIT, _handleSignal),
PosixSignalRegistration.Create(PosixSignal.SIGTERM, _handleSignal)
];
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
_logger.LogInformation("[GRACEFUL SHUTDOWN] StopAsync called - cleaning up resources");
return Task.CompletedTask;
}
private void _handleSignal(PosixSignalContext context)
{
context.Cancel = true;
_logger.LogInformation("[{time:HH:mm:ss.fffff}][GRACEFUL SHUTDOWN] Received signal {signal}, shutting down in {delay} seconds", DateTimeOffset.UtcNow, context.Signal, _delay.TotalSeconds);
Task.Delay(_delay).ContinueWith(_ =>
{
_logger.LogInformation("[{time:HH:mm:ss.fffff}][GRACEFUL SHUTDOWN] Delayed shutdown complete, stopping application.", DateTimeOffset.UtcNow);
_applicationLifetime.StopApplication();
});
}
public void Dispose()
{
foreach(var disposable in _disposables ?? Enumerable.Empty<IDisposable>())
{
disposable.Dispose();
}
}
}
CancellationToken in ASP.NET APIs
By default, the CancellationToken
passed to ASP.NET endpoints is only triggered when:
- The client explicitly cancels the request;
- The HTTP connection is closed or abruptly terminated.
Unlike Background Services, this token is not directly linked to the application lifecycle, which can cause problems when we want to quickly react to application shutdown, especially in Kubernetes or cloud environments.
Solution: Link the CancellationToken to the application shutdown
To fix this, we can link the request's CancellationToken
with the application lifecycle token (IHostApplicationLifetime.ApplicationStopping
). This way, the token will be triggered both by application shutdown and by client disconnection.
There are two main approaches to achieve this:
1. Link per endpoint (specific method)
This approach is useful when we want more control and granularity:
app.MapGet("/endpoint", async (
IHostApplicationLifetime hostApplicationLifetime,
CancellationToken cancellationToken) =>
{
using var combinedTokenSource = CancellationTokenSource.CreateLinkedTokenSource(
cancellationToken,
hostApplicationLifetime.ApplicationStopping);
...
return ...;
});
2. Link globally using Middleware
If you want to apply this behavior globally to all application endpoints, you can use a custom middleware:
app.UseHttpGracefulShutdown();
...
public static class HttpGracefulShutdownMiddlewareExtensions
{
public static IApplicationBuilder UseHttpGracefulShutdown(this IApplicationBuilder builder)
=> builder.UseMiddleware<HttpGracefulShutdownMiddleware>();
internal sealed class HttpGracefulShutdownMiddleware(RequestDelegate next)
{
private readonly RequestDelegate _next = next;
public async Task InvokeAsync(HttpContext context)
{
var hostApplicationLifetime = context.RequestServices.GetRequiredService<IHostApplicationLifetime>();
var originalToken = context.RequestAborted;
// Create a combined token
using var combinedTokenSource = CancellationTokenSource.CreateLinkedTokenSource(
originalToken,
hostApplicationLifetime.ApplicationStopping);
// Replace RequestAborted with the combined token
context.RequestAborted = combinedTokenSource.Token;
try
{
await _next(context);
}
catch(OperationCanceledException exception) when(combinedTokenSource.Token.IsCancellationRequested)
{
if(originalToken.IsCancellationRequested)
{
throw new OperationCanceledException(
$"Request {context.TraceIdentifier}: Cancelled by client disconnect",
exception,
combinedTokenSource.Token);
}
if(hostApplicationLifetime.ApplicationStopping.IsCancellationRequested)
{
throw new OperationCanceledException(
$"Request {context.TraceIdentifier}: Cancelled by application shutdown",
exception,
combinedTokenSource.Token);
}
throw; // Re-throw to maintain default behavior
}
}
}
}
⚠️ Important warning:
You should be careful when using this global approach, as it will impact all application requests. Normally, HTTP or gRPC endpoints should be fast and synchronous operations. If you have longer or more complex operations, it is recommended to consider asynchronous alternatives like messaging systems (queues).