Research

Introducing Bleenk: An Agentic LLM for Real-World Software Engineering

Robi Labs Team

Dec 31, 2025

3 Min Read Min Read

Today, we introduce Bleenk, an agentic large language model developed by Robi Labs for real-world software engineering tasks. Bleenk is designed to operate inside complex codebases, use tools reliably, and power autonomous and semi-autonomous software engineering agents.

Agentic LLMs for Software Development

While modern LLMs are highly capable at atomic coding tasks—such as writing isolated functions or providing code completion—they often struggle with real-world software engineering problems.

Production development requires:

Understanding large and unfamiliar codebases
Reasoning across multiple files and modules
Identifying subtle bugs and edge cases
Using tools such as search, test runners, and build systems
Iterating over failures across long task horizons

Bleenk is built to address these challenges.

Rather than optimizing purely for short-form code generation, Bleenk is trained and evaluated in agentic settings, where the model must reason over long contexts, interact with tools, and maintain state across multi-step workflows.

Designed for Real Engineering Workflows

Bleenk is optimized to run inside code agent scaffolds that define structured interactions between the model, tools, and evaluation environments. This includes workflows inspired by systems such as SWE-Agent–style pipelines, where the model must explore a repository, propose changes, apply patches, and validate results.

By focusing on these environments during training and evaluation, Bleenk learns to:

Navigate large repositories efficiently
Identify relevant files and dependencies
Apply consistent multi-file edits
Recover from intermediate failures

These capabilities are essential for solving real GitHub issues and maintaining production systems.

Benchmark Performance

Bleenk’s design choices translate into strong performance on software engineering benchmarks.

Model	Size (B Tokens)	SWE-bench Verified	SWE-bench Multilingual	Terminal Bench
Bleenk	123	73.2%	71.3%	45.5%
Devstral 2	123	72.2%	61.3%	40.5%
DeepSeek v3.2	671	73.1%	70.2%	46.4%
Kimi K2 Thinking	1000	71.3%	61.1%	35.7%
Claude Sonnet 4.5	–	77.2%	68.0%	42.8%
GPT-5.1 Codex Max	–	77.9%	–	58.1%

Despite being significantly smaller than several competing models, Bleenk delivers competitive or superior performance, particularly in multilingual and tool-driven settings. These results highlight the effectiveness of Bleenk’s agentic training approach.

Built for Tool Use

A defining feature of Bleenk is its tool-first design.

Bleenk is trained to:

Select appropriate tools for a given task
Chain tool calls coherently
Interpret and act on tool outputs
Maintain consistency across long execution traces

This makes Bleenk well-suited for environments where the model must interact with:

Code search tools
File systems
Test and build pipelines
Custom internal developer tooling

In practice, Bleenk behaves less like an autocomplete engine and more like a junior engineer capable of navigating and modifying real systems.

Versatile Deployment: Local ↔️ Enterprise ↔️ Agents

Bleenk is designed to support a wide range of deployment scenarios.

Developers can run Bleenk locally or in controlled environments using Ollama, enabling direct interaction with private codebases:

ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest

This makes Bleenk suitable for:

Local experimentation and research
Privacy-sensitive repositories
Enterprise environments with strict security requirements

Bleenk is also a strong fit for agentic coding platforms, IDE integrations, and internal developer tools that require reliable, tool-aware models.

Availability

Bleenk is currently available via Ollama and through Robi Labs–supported deployments. Licensing and broader distribution details will be shared as the model continues to mature.

For organizations interested in:

Enterprise deployments
Custom integrations
Fine-tuning or continued training on private codebases

we encourage you to contact the Robi Labs team.

What’s Next

Bleenk represents an important step in Robi Labs’ broader vision for agentic AI systems.

We are actively working on:

Expanded tool ecosystems
Stronger verification and testing loops
Improved long-horizon planning
Additional Bleenk variants optimized for different deployment needs

Bleenk is an evolving system, and we welcome feedback from the community and early adopters.

About Robi Labs

Robi Labs builds frontier-scale models and agentic systems focused on practical, production-grade AI for software engineering and complex workflows.

If you’re interested in deploying Bleenk or exploring agentic software engineering systems, we’d love to hear from you.

About author

Robi Labs is an independent AI research company creating next-generation models and tools like Lexa, Picasoe, Framex, Echo, Mira, and MoVi. Our mission is to make AI more human-centric, accessible, and impactful for creators, educators, and developers worldwide.

Robi Labs Team

General

Subscribe to our newsletter

Other blogs

Keep the momentum going with more blogs full of ideas, advice, and inspiration

Read all blogs

Research

Prism is Robi Labs’ new generative vision system built to solve a critical gap in modern creative tools: maintaining identity consistency across scenes, styles, and contexts. Instead of producing isolated images, Prism preserves a subject’s visual coherence while allowing flexible transformations through natural language and multimodal inputs. It supports high-fidelity generation, intuitive text-based editing, multi-image fusion, and strong prompt alignment—enabling creators to build coherent narratives, character-driven visuals, and complex creative workflows with reliability and control.

Keep Reading

Prism: Advancing Identity-Aligned Visual Generation Through Multimodal Understanding

Research

Keep Reading

Prism: Advancing Identity-Aligned Visual Generation Through Multimodal Understanding

Research

Keep Reading

Prism: Advancing Identity-Aligned Visual Generation Through Multimodal Understanding