OpenAI-compatible inference API
All API requests require an API key. Get yours by:
IDLECLOUD_API_KEY=ic_...ic_ prefix.
from idlecloud import IdleCloud
api_key = 'ic_your_api_key_here'
# Create a client instance
client = IdleCloud(api_key=api_key)
# Make a request
response = client.chat.completions.create(
model="gpt-oss-20b", # Model is required
messages=[
{"role": "user", "content": "Hello!"}
],
stream=True
)
# Print the response as it streams in
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
pip install idlecloud
from idlecloud import IdleCloud
api_key = 'ic_your_api_key_here'
# Create a client instance
client = IdleCloud(api_key=api_key)
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
# Print the response as it streams in
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import asyncio
from idlecloud import AsyncIdleCloud
async def main():
api_key = 'ic_your_api_key_here'
client = AsyncIdleCloud(api_key=api_key)
response = await client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
async for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
asyncio.run(main())
https://api.idlecloud.ai/v1
POST /v1/chat/completions
Authorization: Bearer ic_your_api_key_here
Content-Type: application/json
{
"model": "gpt-oss-20b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 150,
"stream": false
}
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1733256789,
"model": "gpt-oss-20b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 8,
"total_tokens": 31,
"completion_tokens_details": {
"reasoning_tokens": 0
}
}
}
Content-Type: text/event-stream
data: {"id":"chatcmpl-...","object":"chat.completion.chunk",...}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk",...}
data: [DONE]
curl https://api.idlecloud.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ic_your_api_key" \
-d '{
"model": "gpt-oss-20b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
gpt-oss-20b-safeguard - Perfect for a broad range of moderation tasks
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model to use (e.g., "gpt-oss-20b") |
messages |
array | Yes | List of message objects with role and content |
temperature |
number | No | Sampling temperature (0-2, default: 1) |
max_tokens |
integer | No | Maximum tokens to generate |
stream |
boolean | No | Enable streaming (default: false) |
top_p |
number | No | Nucleus sampling (0-1) |
frequency_penalty |
number | No | Reduce repetition (-2 to 2) |
presence_penalty |
number | No | Encourage new topics (-2 to 2) |
stop |
string/array | No | Stop sequences |
{
"error": {
"message": "Invalid API key provided",
"type": "invalid_request_error",
"code": "invalid_api_key",
"param": null
}
}
| Code | Meaning |
|---|---|
200 |
Success |
400 |
Bad request (invalid parameters) |
401 |
Unauthorized (invalid API key) |
404 |
Model not found |
429 |
Rate limit exceeded |
500 |
Internal server error |
503 |
Service unavailable (no miners available) |
invalid_api_key - API key is invalid or missingmodel_not_found - Requested model doesn't existrate_limit_exceeded - Too many requestsinsufficient_quota - Not enough creditsinvalid_request_error - Invalid request parametersX-RateLimit-Limit-Requests: 10
X-RateLimit-Remaining-Requests: 9
X-RateLimit-Reset-Requests: 2025-12-05T17:30:00Z
from openai import RateLimitError
from time import sleep
try:
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
except RateLimitError as e:
retry_after = int(e.response.headers.get("Retry-After", 60))
sleep(retry_after)
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
The usage object includes:
prompt_tokens: Input tokenscompletion_tokens: Output tokens (including reasoning)total_tokens: Sum of prompt + completioncompletion_tokens_details.reasoning_tokens: Tokens used for reasoning (gpt-oss-20b){
"usage": {
"prompt_tokens": 78,
"completion_tokens": 116,
"total_tokens": 194,
"completion_tokens_details": {
"reasoning_tokens": 98
}
}
}
from openai import OpenAI
client = OpenAI(
api_key="ic_...", # Use IdleCloud API key
base_url="https://api.idlecloud.ai/v1" # Point to IdleCloud
)
# Everything else is identical
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}]
)
from idlecloud import IdleCloud
api_key = 'ic_your_api_key_here'
client = IdleCloud(api_key=api_key)
# Same interface as OpenAI
response = client.chat.completions.create(
model="gpt-oss-20b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
max_tokens to limit coststemperature for deterministic tasks