Research

Research

Research

Introducing Bleenk: An Agentic LLM for Real-World Software Engineering

Robi Labs Team

Dec 31, 2025

3 Min Read Min Read

Today, we introduce Bleenk, an agentic large language model developed by Robi Labs for real-world software engineering tasks. Bleenk is designed to operate inside complex codebases, use tools reliably, and power autonomous and semi-autonomous software engineering agents.

Agentic LLMs for Software Development

While modern LLMs are highly capable at atomic coding tasks—such as writing isolated functions or providing code completion—they often struggle with real-world software engineering problems.

Production development requires:

  • Understanding large and unfamiliar codebases

  • Reasoning across multiple files and modules

  • Identifying subtle bugs and edge cases

  • Using tools such as search, test runners, and build systems

  • Iterating over failures across long task horizons

Bleenk is built to address these challenges.

Rather than optimizing purely for short-form code generation, Bleenk is trained and evaluated in agentic settings, where the model must reason over long contexts, interact with tools, and maintain state across multi-step workflows.

Designed for Real Engineering Workflows

Bleenk is optimized to run inside code agent scaffolds that define structured interactions between the model, tools, and evaluation environments. This includes workflows inspired by systems such as SWE-Agent–style pipelines, where the model must explore a repository, propose changes, apply patches, and validate results.

By focusing on these environments during training and evaluation, Bleenk learns to:

  • Navigate large repositories efficiently

  • Identify relevant files and dependencies

  • Apply consistent multi-file edits

  • Recover from intermediate failures

These capabilities are essential for solving real GitHub issues and maintaining production systems.

Benchmark Performance

Bleenk’s design choices translate into strong performance on software engineering benchmarks.

Model

Size (B Tokens)

SWE-bench Verified

SWE-bench Multilingual

Terminal Bench

Bleenk

123

73.2%

71.3%

45.5%

Devstral 2

123

72.2%

61.3%

40.5%

DeepSeek v3.2

671

73.1%

70.2%

46.4%

Kimi K2 Thinking

1000

71.3%

61.1%

35.7%

Claude Sonnet 4.5

77.2%

68.0%

42.8%

GPT-5.1 Codex Max

77.9%

58.1%

Despite being significantly smaller than several competing models, Bleenk delivers competitive or superior performance, particularly in multilingual and tool-driven settings. These results highlight the effectiveness of Bleenk’s agentic training approach.

Built for Tool Use

A defining feature of Bleenk is its tool-first design.

Bleenk is trained to:

  • Select appropriate tools for a given task

  • Chain tool calls coherently

  • Interpret and act on tool outputs

  • Maintain consistency across long execution traces

This makes Bleenk well-suited for environments where the model must interact with:

  • Code search tools

  • File systems

  • Test and build pipelines

  • Custom internal developer tooling

In practice, Bleenk behaves less like an autocomplete engine and more like a junior engineer capable of navigating and modifying real systems.

Versatile Deployment: Local ↔️ Enterprise ↔️ Agents

Bleenk is designed to support a wide range of deployment scenarios.

Developers can run Bleenk locally or in controlled environments using Ollama, enabling direct interaction with private codebases:

ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest

This makes Bleenk suitable for:

  • Local experimentation and research

  • Privacy-sensitive repositories

  • Enterprise environments with strict security requirements

Bleenk is also a strong fit for agentic coding platforms, IDE integrations, and internal developer tools that require reliable, tool-aware models.

Availability

Bleenk is currently available via Ollama and through Robi Labs–supported deployments. Licensing and broader distribution details will be shared as the model continues to mature.

For organizations interested in:

  • Enterprise deployments

  • Custom integrations

  • Fine-tuning or continued training on private codebases

we encourage you to contact the Robi Labs team.

What’s Next

Bleenk represents an important step in Robi Labs’ broader vision for agentic AI systems.

We are actively working on:

  • Expanded tool ecosystems

  • Stronger verification and testing loops

  • Improved long-horizon planning

  • Additional Bleenk variants optimized for different deployment needs

Bleenk is an evolving system, and we welcome feedback from the community and early adopters.

About Robi Labs

Robi Labs builds frontier-scale models and agentic systems focused on practical, production-grade AI for software engineering and complex workflows.

If you’re interested in deploying Bleenk or exploring agentic software engineering systems, we’d love to hear from you.

About author

About author

About author

Robi Labs is an independent AI research company creating next-generation models and tools like Lexa, Picasoe, Framex, Echo, Mira, and MoVi. Our mission is to make AI more human-centric, accessible, and impactful for creators, educators, and developers worldwide.

Robi Labs Team

General

Subscribe to our newsletter

Sign up to get the most recent blog articles in your email every week.

Other blogs

Other blogs

Keep the momentum going with more blogs full of ideas, advice, and inspiration