Why go serverless for Machine Learning inference?

The benefits of using serverless GPUs over regular cloud hosting services to run the machine learning inference behind your AI.

Digest time
3 min read
Published
10/2/2022
Tags
MLOps
Author
Rosie Bennett

In this blogpost we’re looking at why and how serverless is the right choice to power AI apps and products, and what we are doing at Mystic to deliver a fully hosted solution to run your deep learning models in production.

We believe that serverless computing is at an inflection point and have made it our mission to:

Enable our customers to deliver robust, dynamic AI products in an agile, 'low-code' and highly cost-effective way' [ie. stop overpaying for your cloud!].
Reduce carbon emissions for both ourselves, our customers and the AI/ML supply chain by optimising the task:server ratio and compute times.
Lower the barrier to entry (through cost reduction and performance agility) for the emerging global enterprise ML supply chain.

What is serverless?

A serverless GPU cloud system works as a cloud server—a computing engine that automatically responds to requests made by computing clients. Instead of managing the server software and running it in-house, a cloud system deploys and runs the server software externally, on the cloud service provider’s own servers. Clients interact with the software remotely over the network.

The serverless compute model provides an efficient alternative to server-based platforms for applications that require compute in the cloud. Our customers use Pipeline tools and cloud infrastructure to deploy open source LLM’s, and to host and serve their own deep learning models.

In serverless computing, infrastructure is abstracted from application code, which can then be run on any infrastructure. By providing powerful tools for abstracting away the hardware we can enable devs to concentrate on developing and deploying data pipelines. Compared to server-side computing, a serverless solution requires fewer concerns, and a user only needs to configure the application code.

What makes Pipeline a good hosting solution?

Our serverless infrastructure means that we can offer low latency and high speeds. This is due to our proprietary task distribution system that minimises under-utilisation of hardware and operates advanced caching management to minimise the cold start problem. Until bandwidth across PCIE into the GPU is instantaneous, all ML compute providers will be troubled by the cold-start issue where models first require loading before inference can be served. With our custom distribution software we have minimised the impact of these, and you will never be charged for model loading time.

With enterprise level cloud providers you generally need to purchase a server or a block of servers that are constantly running and always available, even when unused. This is expensive and wasteful in terms of resources. There is also generally a level of complexity involved to get started and you need to be an expert to ensure your account is optimised and to avoid paying for services that you are not actually using.

As a company that is focused on doing a few very difficult things really well, our mission is to offer a really straightforward, affordable and simple serverless setup for machine learning practitioners. — Paul Hetherington, CEO Mystic

Our Quickstart set up process makes it super easy to make a Pipeline account and deploy a model. Our API will respond instantly when it is called and will not incur costs when at rest. We only charge for the time it takes for the GPU to process your compute (see our pricing page for details).

Summary of benefits; serverless vs traditional cloud

* Shortest path to achieving an out-of-the-box architecture, development and deployment of machine learning models * Free up your dev time by skipping the DevOps (purchasing, provisioning, and managing backend servers)* Flexible development environment supporting multiple languages* Option to use the latest open source text or image generation pre-trained LLM’s (Stable Diffusion, DALL·E, GPT-J etc) via and API, or deploy your own* Transparent pricing and low pay-as-you-go compute environment to develop, test, and experiment* Inherently scalable by design with drastically reduced resource-management requirements

The Pipeline serverless offer

DEVELOPER: From standard language tasks to image processing, our model hub hosts a range of production-ready ML models - or users can upload their own.

CLOUD: End to end infrastructure on cloud GPUs for inference or training of open source or bespoke models. Smart scaling and pay-per-use supports usage as it grows; no quotas, no minimums, no hourly rental.

PIPELINE ENTERPRISE: Our optimised ML task distribution software empowers organizations to drive workplace innovation, reduce on-prem hardware costs and achieve their ML goals.

See our pricing page for details our our pay-per-second rates for compute.

ABOUT PIPELINE.AI

Pipeline AI makes it easy to work with ML models and to deploy AI at scale. The self-serve platform provides a fast pay-as-you-go API to run pretrained or proprietory models in production. If you are looking to deploy a large product and would like to sign up as an Enterprise customer please get in touch.

Follow us on Twitter and Linkedin.

Continue reading...

Charisma.ai, delivery tools to power the metaverse

How to make a Stable Diffusion discord bot