Run any AI model
in your cloud or ours
With Mystic you can deploy ML in your own Azure/AWS/GCP
account or deploy in our shared GPU cluster
Recommended
Cloud integration with AWS/Azure/GCP
Mystic in your cloud
All Mystic features directly in your own cloud. In a few simple steps you get the most cost effective and scalable way of running ML inference.
Pay-as-you-go API
Mystic in our shared cloud
Our shared cluster of GPUs used by 100s of users simultaneously. Low cost but performance will vary depending on realtime GPU availability.
Created and used by experts at
Bring your generative AI product
to market faster
Good AI products need good models and infrastructure;
we solve the infrastructure part.
Cost optimizations
Run on spot and parallelized GPUs
Run in AWS/GCP/Azure and use your cloud credits
Fast inference
Use vLLM, TensorRT, TGI or any other inference engine
Low cold starts with our fast registry
Simpler developer experience
A fully managed Kubernetes platform that runs in your own cloud
Open-source Python library and API to simplify your entire AI workflow
With our managed platform designed for AI
You get a high-performance platform to serve your AI models. Mystic will automatically scale up and down GPUs depending on the number of API calls your models receive. You can easily view, edit and monitor your infrastructure from your Mystic Dashboard, CLI and APIs.
Cost optimizations
What we’ve done to make sure your infrastructure bill is as low as possible.
Pay GPUs at cost of cloud
Serverless providers charge you a premium on compute that quickly becomes very expensive. With Mystic running in your cloud, there is no added fee on compute.
Run inference on spot instances
Mystic allows you to run your AI models on spot instances and automatically manage the request of new GPUs when preempted.
Run in parallel, in the same GPU
Mystic supports GPU fractionalization. With 0 code changes, you can run multiple models on the same A30 or A100 or H100 or H200 GPU and maximise GPU utilization.
Automatically scale down to 0-GPUs
If your models in production stop receiving requests, our auto-scaler will automatically release the GPUs back to the cloud provider. You can easily customize these warmup and cooldown periods with our API.
Cloud credits and commitments
If you are a company with cloud credits or existing cloud spend agreements, you can use them to pay for your cloud bill while using Mystic.
Performance optimizations
What we’ve done to make sure your models run extremely fast and have minimal cold start.
Bring your inference engines
Within a few milliseconds our scheduler decides the optimal strategy of queuing, routing and scaling.
High-performance model loader built in Rust
Thanks to our custom container registry, written in Rust, experience much lower cold-starts than anywhere else in the market and load your containers extremely fast.
A simple and beautiful developer experience
We believe data-scientists and AI engineers should be able to safely deploy their ML without having to be experts in infrastructure.
No Kubernetes or DevOps experience required
Our managed platform removes all the complexities of building and maintaining your custom ML platform. We’ve packed all the engineering so you don’t have to.
APIs, CLI and Python SDK to deploy and run your ML
Extremely simple APIs, our CLI tool and an open-source Python library to give you the freedom and confidence of serving high-performance ML models..
A beautiful dashboard to view and manage all your ML deployments
A unified dashboard to view all your runs, ML pipelines, versions, GPU clusters, API tokens and much more.
Get started with Mystic
Run your AI models in in your cloud or ours
With Mystic, you can deploy ML in your own Azure/AWS/GCP account or deploy in our shared GPU cluster.
Recommended
Cloud integration with AWS/Azure/GCP
Mystic in your cloud
All Mystic features directly in your own cloud. In a few simple steps you get the most cost effective and scalable way of running ML inference.
Pay-as-you-go API
Mystic in our shared cloud
Our shared cluster of GPUs used by 100s of users simultaneously. Low cost but performance will vary depending on realtime GPU availability.
How to deploy AI models with Mystic
From 0 to fast API endpoint
From your custom SDXL, to your fine-tuned LLM. Whether it’s a LoRa or a complex pipeline. Our open-source tool will allow you to package your ML pipeline.
Wrap your pipeline with our open source library
Pipeline AI is our open-source Python library to wrap AI pipelines.
Whether it's a standard PyTorch model, a HuggingFace model, a combination of multiple models using your favourite inference engine or your fine-tuned models.
You get it, it's flexible and you can package anything.
from huggingface_hub import snapshot_download
from vllm import LLM, SamplingParams
from pipeline import entity, pipe
@entity
class LlamaPipeline
@pipe(on_startup=True, run_once=True)
def load_model(self) -> None:
model_dir = "/tmp/llama2-7b-cache/"
snapshot_download(
"meta-llama/Llama-2-7b-chat-hf",
local_dir=model_dir,
token="YOUR_HUGGINGFACE_TOKEN",
)
self.llm = LLM(
model_dir,
dtype="bfloat16",
)
self.tokenizer = self.llm.get_tokenizer()
Deploy to AWS, GCP, Azure with Mystic
With a single command, a new version of your pipeline is deployed on your own cloud.
Upload your pipeline
pipeline container push
Run your AI model as an API
Get an instant API endpoint to run your model after upload. Mystic automatically scales up and down GPUs depending on the usage of your deployed model. Use our APIs, CLI or Dashboard to view and manage your models and infrastructure.
RESTful APIs to call your model
curl -X POST 'https://www.mystic.ai/v4/runs/stream'
--header 'Authorization: Bearer YOUR_TOKEN'
--header 'Content-Type: application/json'
--data '{
"pipeline": "user/pipeline_streaming:v1",
"inputs": [{"type":"string","value":"A lone tree in the dessert"}]
}'-N
Community
See what our public community uploads and deploy in your own cloud with 1-click-deploy.