Documentation
Comprehensive guides and API documentation for the Micro AI platform
Overview
Micro AI is a comprehensive microservices platform that provides LLM servers (LLAMA_CPP_PY, VLLM), intelligent routing with LiteLLM and NGINX gateway, and complete observability stack. The platform enables easy deployment and management of AI services with real-time monitoring and scaling capabilities.
Architecture Components
- NGINX Gateway - Front-facing request router
- LiteLLM Router - AI model management and routing
- Service Manager - Lifecycle orchestration
- Text Tools - Chunking, tokenization, NLP
- Translator - Machine translation via LiteLLM
- Adapters - Model-to-API mapping
- LangFuse - LLM application monitoring
- Grafana - Metrics visualization
- Prometheus - Metrics collection
- Checkmate - Uptime monitoring
- PostgreSQL - Primary database
- Redis - Caching and temporary storage
- ClickHouse - Analytics database
- MinIO - Object storage
Available Endpoints
/llm_router/v1- OpenAI API compatible endpoint for LLM services/service-manager- Service management with UI at/service-manager/ui/text_tools- Text processing APIs (chunking, tokenization)/translator- Machine translation service/langfuse- LLM observability platform/grafana- Monitoring dashboards/services- List available local models and services
Complete Endpoint List: For a comprehensive list of all available endpoints and their documentation, visit https://microai.staging.sirenanalytics.com/microservices
Quick Start Guide
1. Service Management
- Navigate to Service Manager to load/unload local models
- Create new services by selecting available models
- Configure memory utilization and scaling parameters
- Monitor service health and resource usage in real-time
2. API Integration
https://microai.staging.sirenanalytics.com/llm_router/v1Bearer microai-key-xxxUse the base URL above as your OpenAI client endpoint to properly route through the LiteLLM gateway. All requests require a valid Micro AI API key for authentication.
3. Monitoring & Observability
- Access Grafana dashboards for system metrics
- Use LangFuse for LLM application performance monitoring
- Monitor container resources with cAdvisor
- Track GPU utilization with DCGM integration
Service Profiles
The platform uses Docker Compose profiles to organize services by functionality: