ideasbd Compact
A small dense model for high-volume chat, classification, and extraction. Optimized for sub-200ms p95 on single-GPU inference. Ideal when latency and unit economics matter more than maximum reasoning depth.
- Context: 32k tokens · batch-friendly serving
- Strong on structured JSON and tool calling
- Published MMLU / MT-Bench deltas vs. prior Compact generation