API Documentation

OpenAI-compatible inference API

Quick Navigation

Getting Started

Authentication

All API requests require an API key. Get yours by:

  1. Sign up at idlecloud.ai
  2. Complete the onboarding flow
  3. Generate an API key from your dashboard
  4. Set environment variable: IDLECLOUD_API_KEY=ic_...
Note: API keys start with the ic_ prefix.

Quick Start

from idlecloud import IdleCloud

api_key = 'ic_your_api_key_here'

# Create a client instance
client = IdleCloud(api_key=api_key)

# Make a request
response = client.chat.completions.create(
    model="gpt-oss-20b",  # Model is required
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    stream=True
)

# Print the response as it streams in
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Python SDK

Installation

pip install idlecloud

Basic Usage

from idlecloud import IdleCloud

api_key = 'ic_your_api_key_here'

# Create a client instance
client = IdleCloud(api_key=api_key)

Streaming

response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

# Print the response as it streams in
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async Support

import asyncio
from idlecloud import AsyncIdleCloud

async def main():
    api_key = 'ic_your_api_key_here'
    client = AsyncIdleCloud(api_key=api_key)

    response = await client.chat.completions.create(
        model="gpt-oss-20b",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True
    )

    async for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

REST API

Base URL

https://api.idlecloud.ai/v1

Endpoint: Chat Completions

POST /v1/chat/completions

Headers

Authorization: Bearer ic_your_api_key_here
Content-Type: application/json

Request Body

{
  "model": "gpt-oss-20b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 150,
  "stream": false
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1733256789,
  "model": "gpt-oss-20b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The capital of France is Paris."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 8,
    "total_tokens": 31,
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Streaming Response

Content-Type: text/event-stream

data: {"id":"chatcmpl-...","object":"chat.completion.chunk",...}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk",...}

data: [DONE]

cURL Example

curl https://api.idlecloud.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ic_your_api_key" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Available Models

gpt-oss-20b

Coming Soon: gpt-oss-20b-safeguard - Perfect for a broad range of moderation tasks

Parameters

Parameter Type Required Description
model string Yes Model to use (e.g., "gpt-oss-20b")
messages array Yes List of message objects with role and content
temperature number No Sampling temperature (0-2, default: 1)
max_tokens integer No Maximum tokens to generate
stream boolean No Enable streaming (default: false)
top_p number No Nucleus sampling (0-1)
frequency_penalty number No Reduce repetition (-2 to 2)
presence_penalty number No Encourage new topics (-2 to 2)
stop string/array No Stop sequences

Error Handling

Error Response Format

{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key",
    "param": null
  }
}

HTTP Status Codes

Code Meaning
200 Success
400 Bad request (invalid parameters)
401 Unauthorized (invalid API key)
404 Model not found
429 Rate limit exceeded
500 Internal server error
503 Service unavailable (no miners available)

Error Codes

Rate Limits

Rate Limit Headers

X-RateLimit-Limit-Requests: 10
X-RateLimit-Remaining-Requests: 9
X-RateLimit-Reset-Requests: 2025-12-05T17:30:00Z

Handling Rate Limits

from openai import RateLimitError
from time import sleep

try:
    response = client.chat.completions.create(
        model="gpt-oss-20b",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True
    )
except RateLimitError as e:
    retry_after = int(e.response.headers.get("Retry-After", 60))
    sleep(retry_after)
    response = client.chat.completions.create(
        model="gpt-oss-20b",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True
    )

Usage Tracking

Token Counting

The usage object includes:

Example

{
  "usage": {
    "prompt_tokens": 78,
    "completion_tokens": 116,
    "total_tokens": 194,
    "completion_tokens_details": {
      "reasoning_tokens": 98
    }
  }
}
Note: You are billed for all tokens including reasoning tokens, even though only the final answer is returned.

Migration from OpenAI

Using OpenAI SDK (No Code Changes)

from openai import OpenAI

client = OpenAI(
    api_key="ic_...",  # Use IdleCloud API key
    base_url="https://api.idlecloud.ai/v1"  # Point to IdleCloud
)

# Everything else is identical
response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Using IdleCloud SDK (Recommended)

from idlecloud import IdleCloud

api_key = 'ic_your_api_key_here'
client = IdleCloud(api_key=api_key)

# Same interface as OpenAI
response = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Best Practices

API Key Security

Rate Limit Management

Cost Optimization

Error Handling