Agentic LLMs for Software Development
While modern LLMs are highly capable at atomic coding tasks—such as writing isolated functions or providing code completion—they often struggle with real-world software engineering problems.
Production development requires:
Understanding large and unfamiliar codebases
Reasoning across multiple files and modules
Identifying subtle bugs and edge cases
Using tools such as search, test runners, and build systems
Iterating over failures across long task horizons
Bleenk is built to address these challenges.
Rather than optimizing purely for short-form code generation, Bleenk is trained and evaluated in agentic settings, where the model must reason over long contexts, interact with tools, and maintain state across multi-step workflows.
Designed for Real Engineering Workflows
Bleenk is optimized to run inside code agent scaffolds that define structured interactions between the model, tools, and evaluation environments. This includes workflows inspired by systems such as SWE-Agent–style pipelines, where the model must explore a repository, propose changes, apply patches, and validate results.
By focusing on these environments during training and evaluation, Bleenk learns to:
Navigate large repositories efficiently
Identify relevant files and dependencies
Apply consistent multi-file edits
Recover from intermediate failures
These capabilities are essential for solving real GitHub issues and maintaining production systems.
Benchmark Performance
Bleenk’s design choices translate into strong performance on software engineering benchmarks.
Model | Size (B Tokens) | SWE-bench Verified | SWE-bench Multilingual | Terminal Bench |
|---|---|---|---|---|
Bleenk | 123 | 73.2% | 71.3% | 45.5% |
Devstral 2 | 123 | 72.2% | 61.3% | 40.5% |
DeepSeek v3.2 | 671 | 73.1% | 70.2% | 46.4% |
Kimi K2 Thinking | 1000 | 71.3% | 61.1% | 35.7% |
Claude Sonnet 4.5 | – | 77.2% | 68.0% | 42.8% |
GPT-5.1 Codex Max | – | 77.9% | – | 58.1% |
Despite being significantly smaller than several competing models, Bleenk delivers competitive or superior performance, particularly in multilingual and tool-driven settings. These results highlight the effectiveness of Bleenk’s agentic training approach.
Built for Tool Use
A defining feature of Bleenk is its tool-first design.
Bleenk is trained to:
Select appropriate tools for a given task
Chain tool calls coherently
Interpret and act on tool outputs
Maintain consistency across long execution traces
This makes Bleenk well-suited for environments where the model must interact with:
Code search tools
File systems
Test and build pipelines
Custom internal developer tooling
In practice, Bleenk behaves less like an autocomplete engine and more like a junior engineer capable of navigating and modifying real systems.
Versatile Deployment: Local ↔️ Enterprise ↔️ Agents
Bleenk is designed to support a wide range of deployment scenarios.
Developers can run Bleenk locally or in controlled environments using Ollama, enabling direct interaction with private codebases:
This makes Bleenk suitable for:
Local experimentation and research
Privacy-sensitive repositories
Enterprise environments with strict security requirements
Bleenk is also a strong fit for agentic coding platforms, IDE integrations, and internal developer tools that require reliable, tool-aware models.
Availability
Bleenk is currently available via Ollama and through Robi Labs–supported deployments. Licensing and broader distribution details will be shared as the model continues to mature.
For organizations interested in:
Enterprise deployments
Custom integrations
Fine-tuning or continued training on private codebases
we encourage you to contact the Robi Labs team.
What’s Next
Bleenk represents an important step in Robi Labs’ broader vision for agentic AI systems.
We are actively working on:
Expanded tool ecosystems
Stronger verification and testing loops
Improved long-horizon planning
Additional Bleenk variants optimized for different deployment needs
Bleenk is an evolving system, and we welcome feedback from the community and early adopters.
About Robi Labs
Robi Labs builds frontier-scale models and agentic systems focused on practical, production-grade AI for software engineering and complex workflows.
If you’re interested in deploying Bleenk or exploring agentic software engineering systems, we’d love to hear from you.
Robi Labs is an independent AI research company creating next-generation models and tools like Lexa, Picasoe, Framex, Echo, Mira, and MoVi. Our mission is to make AI more human-centric, accessible, and impactful for creators, educators, and developers worldwide.
Robi Labs Team
General
Subscribe to our newsletter
Sign up to get the most recent blog articles in your email every week.



