Turbo registry
High-performance AI model loader
Mystic Turbo Registry is an optimized container loader which can reduce cold-start times by 90%, written in Rust.
SDXL (10GB Docker image)
with Turbo Registry
Hardware
allocation
Docker image
loading
Running /
inference
✨
Others
Hardware
allocation
Docker image
loading
Running /
inference
😔
Llama-70B (85GB Docker image)
with Turbo Registry
Hardware
allocation
Docker image
loading
Running /
inference
✨
Others
Hardware
allocation
Docker image
loading
Running /
inference
😔
Benchmarks
What are cold-starts?
We break down cold-start times into four main parts:
Hardware allocation: How long a cloud provider takes to give you a fresh instance. This varies per instance type, region and cloud provider.
Container downloading: How long downloading the image onto the fresh instance takes.
Container extraction: Once downloaded, containers must be extracted and layers processed.
Pipeline loading: This includes loading the ML model into GPU memory and the first inference pass. The next runs are faster because the model is already cached.
Why Turbo Registry?
Mystic Turbo Registry provides market leading container loading, enabling your models to run faster.
The problem
Downloading containers takes a very long time - a typical registry will allow you to download layers at ~150MB/s.
Once downloaded containers have to be extracted before being run, this can take minutes.
Cold start times mean servers are running for longer when not being used and cannot be shut down.
Your customers sit waiting for a response from a model.
The solution
After analyzing the 4 main processes in cold-starts, we found that loading containers onto hardware can be greatly optimized, allowing loading the model in a fraction of the time.
Rust, building the registry natively in a high performance language allowed us to increase the total download throughput substantially.