All News

Deep Cogito Unveils Self-Improving Large Language Models

Deep Cogito, founded by ex-Googlers, has released four open-ish LLMs in its v2 series, ranging from 70B to 671B parameters. These Dense and Mixture-of-Experts models learn to reason more efficiently over time through iterative distillation and hybrid training. Available under mixed licensing on Hugging Face, Unsloth, and via APIs, they drive faster inference with lower compute costs.

Published August 1, 2025 at 01:12 AM EDT in Artificial Intelligence (AI)

Deep Cogito, the San Francisco–based AI startup founded by ex-Googlers, today announced the release of its v2 family of large language models. Spanning 70 billion to 671 billion parameters, these models bring hybrid reasoning and self-improvement baked into their training, promising faster, more efficient inference over time.

Hybrid Dense and MoE Models

Cogito v2 offers two architectures: Dense variants activate all parameters every pass for predictable, low-latency deployments, while Mixture-of-Experts (MoE) models route through specialized subnetworks to scale size without scaling compute linearly.

  • Cogito v2-70B (Dense)
  • Cogito v2-109B (Mixture-of-Experts)
  • Cogito v2-405B (Dense)
  • Cogito v2-671B (Mixture-of-Experts)

Dense models suit fine-tuning and low-GPU environments, while MoE variants excel at high-performance inference with lower runtime costs. A quantized 8-bit FP8 configuration further cuts hardware requirements with only minimal accuracy trade-offs.

Integrated Self-Improving Reasoning

Cogito v2’s core innovation is hybrid reasoning: during training, the models generate internal chains of thought and distill those insights back into their weights. Over time, they learn which reasoning paths truly matter, enabling shorter inference chains and stronger intuition.

  • Math prompts conclude in under 100 tokens with accurate calculations, reducing unit errors common in step-by-step approaches.
  • Legal reasoning tasks break down precedent checks in two clear stages, offering nuanced justifications.
  • Multi-hop questions like family-relation puzzles resolve pronoun ambiguities with higher reliability than many open LLMs.

Performance and Efficiency

The 671B MoE flagship matches or exceeds leading open and closed models on benchmarks like MMLU, GSM8K, and MGSM, while using reasoning chains up to 60% shorter. Developers see lower inference latency and operating costs without sacrificing accuracy.

Cost-Effective Frontier Training

Deep Cogito trained all eight of its models for under $3.5 million, a fraction of typical nine-figure budgets. By focusing on smarter priors and skipping redundant reasoning, they deliver top-tier performance on a lean budget.

Open Source Availability

All Cogito v2 models are open source on Hugging Face and Unsloth, with API access via Together AI, Baseten, and RunPod. Enterprises and researchers can run locally, fine-tune for domain tasks, and contribute to evolving reasoning architectures.

Keep Reading

View All
The Future of Business is AI

AI Tools Built for Agencies That Move Fast.

QuarkyByte can help your enterprise benchmark and integrate Cogito v2 models to reduce inference costs by up to 40% and enhance reasoning tasks. Our analysts guide you through hardware fit, quantization strategies, and fine-tuning workflows to power complex AI services with precision.