AI services and APIs¶

At a glance¶

BeginnerIntermediateAdvanced

When you use Cursor, ChatGPT, or Claude, you're using an AI service — someone else runs the model and you access it through the internet. There are several providers to choose from (Anthropic, OpenAI, Google, and others), some free, some paid. You can also run smaller AI models on your own computer. This article explains the options and what to watch out for.

The provider landscape: proprietary APIs (Anthropic Claude, OpenAI GPT-4, Google Gemini), cloud wrappers (AWS Bedrock, Azure OpenAI, Vertex AI), and open-weight models (Llama, Mistral) via Hugging Face or self-hosted with vLLM/Ollama. Choice depends on quality needs, latency budget, privacy requirements, and cost. Multi-model architectures (cheap model for routing, capable model for reasoning) are becoming standard. OWASP Top 10 for LLM apps covers the security layer.

Provider selection matrix across reasoning quality, latency, cost, data residency, and compliance (SOC 2, HIPAA, EU AI Act). Multi-model routing: classifier → fast model (Haiku/Flash) for simple tasks, capable model (Opus/o3) for complex reasoning. Open-weight frontier: Llama 3 405B and DeepSeek-R1 approach proprietary quality with full weight access. Security: OWASP LLM Top 10 (prompt injection, excessive agency), NIST AI RMF (Govern/Map/Measure/Manage), EU AI Act risk classification.

This article maps the commercial and open-source landscape for accessing LLMs — the APIs, platforms, and model hubs a practitioner chooses between when building AI-powered applications. It also covers security and governance frameworks that apply regardless of which provider you choose.

The provider landscape¶

Proprietary API providers¶

Provider	Flagship models	Key differentiator	Documentation
Anthropic	Claude 4 (Opus, Sonnet), Claude 3.5	Constitutional AI; strong reasoning and safety; 200K+ context; tool use	docs.anthropic.com
OpenAI	GPT-4o, GPT-4 Turbo, o1/o3 reasoning	Largest ecosystem; function calling; assistants API; multimodal	platform.openai.com
Google	Gemini 2.0 (Pro, Flash, Ultra)	Natively multimodal (text, image, audio, video); 1M+ context; integrated with Google Cloud	ai.google.dev

Cloud platform wrappers¶

Platform	What it offers	When to use it
AWS Bedrock	Managed access to Anthropic, Meta, Mistral, Cohere, and Amazon models through a unified API	You're already on AWS; need enterprise compliance; want model choice without multiple vendor contracts
Azure OpenAI Service	OpenAI models (GPT-4, DALL-E, Whisper) hosted on Azure with enterprise security and compliance	You're on Azure; need data residency guarantees; enterprise procurement
Google Vertex AI	Gemini models + open models (Llama, Mistral) + fine-tuning + MLOps	You're on GCP; need end-to-end ML pipeline; want Gemini natively

Open-weight models and hubs¶

Resource	What it offers	Key models
Hugging Face Hub	Model hosting, inference API, Spaces (demo apps), training libraries	Llama 3, Mistral, Gemma, Phi, thousands of fine-tuned variants
Meta Llama	Open-weight models with permissive license (up to 405B)	Llama 3.1 (8B, 70B, 405B); multimodal and multilingual
Mistral AI	European open-weight models; strong code and multilingual performance	Mistral Large, Mixtral (mixture of experts), Codestral
Ollama	One-command local model running; developer-friendly CLI	Runs any GGUF-quantized model locally; great for prototyping

Choosing a provider¶

There is no single best provider. The decision depends on your constraints:

Constraint	Lean toward
Maximum reasoning quality	Anthropic Claude 4 Opus or OpenAI o3
Lowest latency	Smaller models (Gemini Flash, Claude Haiku, GPT-4o mini) or self-hosted quantized models
Lowest cost	Open-weight models self-hosted; or Gemini Flash / Claude Haiku for API
Data privacy / on-premises	Open-weight models (Llama, Mistral) self-hosted via vLLM or TGI
Multimodal (vision, audio, video)	Gemini 2.0 (natively multimodal) or GPT-4o
Enterprise compliance	AWS Bedrock or Azure OpenAI (SOC 2, HIPAA, data residency)
Rapid prototyping	OpenAI API (largest ecosystem, most examples) or Ollama (local, free)

The multi-model strategy¶

Most production systems use multiple models: a fast, cheap model for classification and routing; a capable model for complex reasoning; a specialized model for code generation. Design your architecture to swap models without rewriting business logic.

Security and governance¶

Regardless of provider, LLM applications face security and governance challenges that don't exist in traditional software:

OWASP Top 10 for LLM Applications¶

The OWASP Foundation maintains a specific top-10 vulnerability list for LLM applications. Key risks include: - Prompt injection — malicious input overrides system instructions - Insecure output handling — LLM output used directly in SQL, HTML, or system commands - Training data poisoning — adversarial data corrupts model behavior - Sensitive information disclosure — model reveals training data or system prompts - Excessive agency — agent given too many permissions without human oversight

NIST AI Risk Management Framework¶

NIST's AI RMF provides a structured approach to AI risk management across four functions: Govern (policies and accountability), Map (contextualize risks), Measure (assess and track), Manage (mitigate and monitor). Useful for organizations that need defensible AI governance.

EU AI Act¶

The EU AI Act (effective 2025–2026) classifies AI systems by risk level — unacceptable, high, limited, minimal — with compliance requirements scaling accordingly. Foundation model providers ("general-purpose AI") have specific transparency and documentation obligations.

Key sources¶

#	Source	Year	Why it matters
1	Anthropic API documentation	Ongoing	Claude models, tool use, prompt engineering, safety — the provider this repo's maintainer uses most
2	OpenAI Platform documentation	Ongoing	GPT-4, function calling, assistants, embeddings — largest API ecosystem
3	Google AI / Gemini documentation	Ongoing	Natively multimodal models; long context; Vertex AI integration
4	AWS Bedrock documentation	Ongoing	Multi-model managed service; enterprise compliance
5	Azure OpenAI Service documentation	Ongoing	OpenAI models on Azure with enterprise security
6	Hugging Face Hub and Inference API	Ongoing	Open model ecosystem; inference, fine-tuning, evaluation tooling
7	OWASP Top 10 for LLM Applications	2025	The standard security vulnerability list for LLM apps — prompt injection, data poisoning, excessive agency
8	NIST AI Risk Management Framework	2023	Structured AI risk management: Govern, Map, Measure, Manage — widely adopted in enterprise

Practical takeaways¶

Start with an API, not self-hosting. Unless you have specific privacy, latency, or cost requirements that mandate self-hosting, proprietary APIs are faster to ship and easier to maintain.
Abstract the provider. Use a thin wrapper or framework (LangChain, LiteLLM, your own adapter) so you can swap models without rewriting application code. Provider lock-in is real and expensive.
Budget for the security layer. Prompt injection is not a theoretical risk — it's a practical one. Input validation, output sanitization, and permission scoping are baseline requirements.
Track the open-weight frontier. Llama 3 405B and DeepSeek-R1 are competitive with proprietary models for many tasks. The gap is closing; self-hosting economics improve continuously.
Compliance is not optional. If you're in a regulated industry or serving EU users, the AI Act and NIST RMF are not "nice to have" — they're requirements. Factor governance into your architecture, not as an afterthought.