--limit 1 on the worker forces entire flow runs to be
sequential. But often only specific tasks need limits—other tasks could overlap.
The solution: Use Global Concurrency Limits
with worker-specific names. GCLs are coordinated by the Prefect server, so they
work across the separate subprocesses that each flow run executes in.
Example: Image processing with ML inference
Consider a pipeline that processes images through an ML model:- Download image — network-bound, can run many in parallel
- Run ML model — uses GPU memory, need to limit concurrent runs
- Save results — disk I/O, can run many in parallel
Setup
Tasks without limits
These tasks don’t contend for limited resources, so they run freely.Task with per-worker limit
This task uses a local resource (GPU) that can only handle limited concurrent usage. The limit is scoped to this worker so each machine has independent limits.The flow
Running the example
1. Create a Global Concurrency Limit for each worker
Each worker machine needs its own limit. The limit value controls how many ML tasks can run simultaneously on that machine.2. Create work pool and deploy
3. Start workers with unique IDs
Each worker needs a unique ID that matches its GCL name:--limit 10 allows up to 10 concurrent flow runs, but the GCL ensures
only 2 are in the ML step at any time.
4. Submit jobs
What you’ll see
With 10 concurrent flow runs on a worker:- Download tasks from all 10 start immediately and overlap
- ML tasks queue up—only 2 run at a time (per the GCL limit)
- Save tasks run as soon as their ML task completes
Why this works
- GCLs are server-coordinated — The Prefect server tracks who holds what limit. It doesn’t matter that flow runs are separate processes.
-
Worker-specific names — By including
worker_idin the limit name, each worker machine has independent limits. GPU-1’s limit doesn’t affect GPU-2. - Selective application — Only the tasks that need limits acquire them. Everything else runs at full concurrency.
Adapting this pattern
The same pattern works for any local resource constraint:- Software licenses: A tool that only allows N concurrent instances
- Memory-intensive processing: Limit concurrent jobs to avoid OOM
- Disk I/O: Limit concurrent writes to a local SSD
- Local services: A sidecar database with connection limits