SOC 2 Certified

Agent-Ready AI
Infrastructure.

Vast is the infrastructure layer where AI agents autonomously design, procure, and optimize their own compute. API-native provisioning. Real-time pricing. Per-second billing.

700K+ transactions/mo20,000+ GPUs40+ data centers68+ GPU types

Trusted by developers and AI teams worldwide

CHAI
BOSCH
Cognition
Inria
IBM
Brave
Speechify
CHAI
BOSCH
Cognition
Inria
IBM
Brave
Speechify

Real-time GPU infrastructure

Prices set by supply and demand across 20,000+ GPUs. Transparent. Programmatically queryable.

How it works

From sign-up to running GPU workloads in under five minutes.

1

Add credit & get your API key

Start with as little as $5. Grab your API key from the console — no contracts, no sales calls.

2

Search GPUs

Filter by model, VRAM, price, and availability — via console or API.

3

Deploy

Launch instances in seconds. Scale up or down programmatically.

Compare. Launch. Exit. Repeat.

Every GPU on Vast.ai is provisioned through code. The same API that developers use to deploy in seconds is the interface agents use to procure and optimize at scale.

Python SDKDocs →

Programmatic compute provisioning in five lines of code.

pip install vastai-sdk

Search, filter, and deploy from your terminal.

pip install vastai
REST APIDocs →

The interface agents call to provision infrastructure.

curl -H "Authorization: Bearer $VAST_API_KEY" https://cloud.vast.ai/api/v1/bundles/
deploy.py

One platform. Three ways to deploy.

GPU Cloud for full control. Serverless for zero-ops inference. Clusters for large-scale training.

GPU Cloud

On-demand instances across 40+ data centers and 20,000+ GPUs. Deploy in seconds via CLI, SDK, or API.

Serverless

Deploy models as endpoints with automatic benchmarking and optimization across GPU types. Autoscale to zero, pay only for compute time.

Clusters

Dedicated multi-node GPU clusters with InfiniBand networking for large-scale training.

Popular Models, Ready to Deploy

Launch pre-configured templates for the most popular open-source models.

Gemma 4 26B A4B IT

Gemma 4 26B A4B MoE vision-language model by Google with 256K context and thinking mode

Gemma 4 31B IT

Gemma 4 31B dense vision-language model by Google with 256K context and thinking mode

LTX-2.3

LTX-2.3 is a DiT-based audio-video foundation model with improved quality and prompt adherence for synchronized video and audio generation

Qwen3.5 397B A17B

Efficient multimodal reasoning model with hybrid DeltaNet-attention architecture

Vast.ai reduced our GPU costs by over 60% while giving us the flexibility to scale training jobs on demand. We serve 200K daily users without breaking the bank.

Giang, Creatix Technology

How teams build on Vast.ai

See how teams use Vast.ai to scale AI infrastructure and accelerate production workloads.

Creatix Technology

Creatix Technology

Creatix Technology Scales to 200K Daily Users with Vast.ai's GPU Cloud

How a fast-growing AI app company cut infrastructure costs by over 60% and powered millions of new users with Vast.ai.

Tech
PAICON

PAICON

PAICON Accelerates Global, Data-Centric Cancer Diagnostics with Vast.ai

How a global oncology data platform used Vast.ai’s GPU cloud to rapidly iterate on Athena—validating that diversity can matter more than scale—while significantly reducing research-phase training costs.

Medical AI

Start with $5. Scale to 20,000 GPUs.

No humans required.