Deep Cogito Unveils Self-Improving Large Language Models
Deep Cogito, founded by ex-Googlers, has released four open-ish LLMs in its v2 series, ranging from 70B to 671B parameters. These Dense and Mixture-of-Experts models learn to reason more efficiently over time through iterative distillation and hybrid training. Available under mixed licensing on Hugging Face, Unsloth, and via APIs, they drive faster inference with lower compute costs.
Deep Cogito, the San Francisco–based AI startup founded by ex-Googlers, today announced the release of its v2 family of large language models. Spanning 70 billion to 671 billion parameters, these models bring hybrid reasoning and self-improvement baked into their training, promising faster, more efficient inference over time.
Hybrid Dense and MoE Models
Cogito v2 offers two architectures: Dense variants activate all parameters every pass for predictable, low-latency deployments, while Mixture-of-Experts (MoE) models route through specialized subnetworks to scale size without scaling compute linearly.
- Cogito v2-70B (Dense)
- Cogito v2-109B (Mixture-of-Experts)
- Cogito v2-405B (Dense)
- Cogito v2-671B (Mixture-of-Experts)
Dense models suit fine-tuning and low-GPU environments, while MoE variants excel at high-performance inference with lower runtime costs. A quantized 8-bit FP8 configuration further cuts hardware requirements with only minimal accuracy trade-offs.
Integrated Self-Improving Reasoning
Cogito v2’s core innovation is hybrid reasoning: during training, the models generate internal chains of thought and distill those insights back into their weights. Over time, they learn which reasoning paths truly matter, enabling shorter inference chains and stronger intuition.
- Math prompts conclude in under 100 tokens with accurate calculations, reducing unit errors common in step-by-step approaches.
- Legal reasoning tasks break down precedent checks in two clear stages, offering nuanced justifications.
- Multi-hop questions like family-relation puzzles resolve pronoun ambiguities with higher reliability than many open LLMs.
Performance and Efficiency
The 671B MoE flagship matches or exceeds leading open and closed models on benchmarks like MMLU, GSM8K, and MGSM, while using reasoning chains up to 60% shorter. Developers see lower inference latency and operating costs without sacrificing accuracy.
Cost-Effective Frontier Training
Deep Cogito trained all eight of its models for under $3.5 million, a fraction of typical nine-figure budgets. By focusing on smarter priors and skipping redundant reasoning, they deliver top-tier performance on a lean budget.
Open Source Availability
All Cogito v2 models are open source on Hugging Face and Unsloth, with API access via Together AI, Baseten, and RunPod. Enterprises and researchers can run locally, fine-tune for domain tasks, and contribute to evolving reasoning architectures.
Keep Reading
View AllAI Disruption Risks a Lost Generation of New Graduates
AI is already replacing entry-level white-collar roles, squeezing recent grads into high unemployment. Policy, education, and businesses must adapt fast.
Microsoft Study Reveals Jobs Safe from AI
Microsoft’s new AI applicability score identifies which occupations AI can enhance and which remain largely immune. Explore the findings.
SixSense Raises $8.5M for AI Chip Defect Detection
SixSense raised $8.5M to power its AI platform for real-time chip defect detection, enabling fabs to improve yields and cut manual inspections.
AI Tools Built for Agencies That Move Fast.
QuarkyByte can help your enterprise benchmark and integrate Cogito v2 models to reduce inference costs by up to 40% and enhance reasoning tasks. Our analysts guide you through hardware fit, quantization strategies, and fine-tuning workflows to power complex AI services with precision.