ITEL’s VibeStudio Achieves a Big AI Breakthrough: World-Best LLM Performance on just One GPU

A small Indian team shows the World how to do More with Less: the first to achieve 55% pruning of the MiniMax M2 open-source LLM for reasoning and software coding.

Chennai, India — November 21, 2025

Large Language Models (LLMs) today is well-known and most commonly used AI tool. There are a large number of LLMs today, including ChatGPT, DeepSeek, Gemini and many others. Many of them are open-sourced, implying that anyone can copy them and use them. They can perform all kinds of tasks. But to do that, they require significant computational resources like high-end GPUs and memory banks, and also consume immense amount of electrical power. Reducing the GPU and Memory usage, while performing the task in SOTA levels, is a major goal of AI developers today. In a world where most companies chase larger and larger LLMs, VibeStudio shows that small, efficient, high-performance models are often the more practical and impactful route. This shift echoes what India does best: build more with less.

Who are we?

We are VibeStudio, an agentic coding suite startup incubated by Immersive Technology and Entrepreneurship Labs (ITEL), and are providing an AI based tool to enable software developers and enterprises to carry out coding automatically. We are led by Arjun Reddy , a native of Madurai and a graduate of Satyabhama Engineering College. Prof. Ashok Jhunjhunwala, the founder and ex-President of IITM Research Park, is the Chairman of ITEL.

What have we achieved?

VibeStudio, with the support of ITEL, undertook the ambitious task of reducing GPU and Memory requirement of an open-source LLM, for the specific task of software coding (known as vibe-coding). When we took up MiniMax M2, the goal wasn't academic curiosity. We needed a model that could power VibeStudio's Agentic Integrated Development Environment (IDE); it is fast, disciplined, and capable of real coding and full repo reasoning, the way developers actually work. But deploying M2 at scale across Indian engineering colleges and enterprise systems would have required a huge number of GPUs (like H200) and Memory, costly as well as energy-intensive. What we needed was to make M2 significantly lighter without diluting the intelligence that makes it valuable.

That is why we engineered THRIFT — Targeted Hierarchical Reduction for Inference and Fine-Tuning. THRIFT isn't a gimmick. It's an engineering process. It examines the model like a structured audit — layer by layer, pathway by pathway — identifying redundant experts, silent activation routes, and dead parameters that add cost but no intelligence. Instead of one reckless pruning pass, THRIFT reduces the model in calibrated stages. After each stage, teacher-guided fine-tuning recalibrates the model so it stays stable and sharp.

The outcome is exactly what we wanted: A 55% size reduction with 80% of the same reasoning strength, coding precision, and in many cases, faster responsiveness than the original. Inside VibeStudio, this pruned M2 handles vibe coding beautifully — structured autocompletions, multi-file context, refactoring, analysis, and real-time agent-driven coding workflows. We have released THRIFT (55% reduced M2) on HuggingFace, as open-source.

Our work so far has already resonated with the wider community. Across HuggingFace, our open-source releases have crossed 150,000+ downloads, and developers repeatedly choose our models because they are engineered with discipline, not hype.

Behind the scenes, we now maintain two private foundational models for enterprise deployments:

  • An 8B Dense Model, optimised for quantised local use on mainstream hardware.
  • A 32B A3B MoE Model, built for secure, high-speed, on-premises reasoning at enterprise scale.
These two are not open-sourced. They are exclusive to our enterprise partners who rely on predictable performance and strict deployment guarantees.

THRIFT, combined with these models and VibeStudio's Agentic IDE, gives us the ability to deliver powerful, affordable AI across India and the world, from large companies, all the way down to first-year engineering students working on budget laptops. This is exactly the direction we believe the next decade of AI should follow. Our work proves that the frontier of AI is not only about training massive models—it's also about engineering intelligence efficiently, and making powerful AI accessible to millions of developers, not just the richest labs.

About VibeStudio

VibeStudio is a startup that offers enterprise-grade evolution of agentic tools like Cursor and Lovable — built for companies that need real AI agents running securely on-premise without compromising data privacy, rate limits or capability.

It gives teams a full-stack agent development environment where coding, reasoning, tool-use, and deployment all happen inside one controlled environment with no external dependency.

For enterprises that want the power of modern AI without sending a single byte outside their firewall, VibeStudio is simply the only practical choice.

About ITEL

Immersive Technology & Entrepreneurship Labs (ITEL) is a Section 8, not-for-profit organization (www.itelfoundation.in), focused on making India a technology leader in selected areas. It also incubates deep-tech startups and provides technology support, mentorship, and strategic guidance to scale. It has set-up Vikram Sarabhai AI Labs (VSAIL) to accelerate sovereign AI.

Q & A Section

1. What problem will this Minimax M2 THRIFT model solve

Minimax M2, when it emerged, was truly state-of-the-art. It actually outperformed models like Opus 4.1 and even GPT-5 Pro. The reason we picked it for our THRIFT pruning was precisely that it was compact enough to begin with, and we knew that once pruned, its results would be best in its class across the industry.

2. Why it's better than competitors

Minimax M2 THRIFT outperforms competitors by offering three purpose-built trims 25%, 40%, and 55% allowing teams to align capability with cost and latency. THRIFT-25 preserves near-baseline accuracy for research, analytics, and long-context reasoning with lower compute requirements; THRIFT-40 serves as the enterprise workhorse for copilots and retrieval-augmented generation, balancing speed and quality; THRIFT-55 is the ultra-compact build for on-device and air-gapped deployments, running comfortably on a single MacBook Pro while surpassing models up to five times larger. Across all variants, customers gain higher tokens per watt, faster time to first token, and materially lower total cost of ownership, delivered in a secure package suitable for on-premises deployment without vendor lock-in.

3. What kind of efforts have been put to achieve this milestone

While competitors are billion-dollar companies, we relied on our incubator, ITEL, led by Padma Shri Ashok Jhunjhunwala, to provide the necessary hardware and mentorship. We didn't have the luxury of running multiple test runs in parallel; instead, we had to iterate sequentially, working day and night on the same cluster. This meant far more effort and ingenuity on our part to achieve the milestone, but we did so in the best possible way.

4. What is the key highlighter – technology, affordability, design, or impact

Affordability while being SOTA is our key highlighter. Our THRIFT model is able to fit on a single MacBook Pro and can provide State of the Art performance for free. While the commercial angle is great, our model also gives institutions the peace of mind of being secure and not having to share their sensitive data with big tech ai players.

5. What has been your journey – why this product exists, the challenges, how did you overcome, and the vision ahead

Our Minimax M2 THIRFT model exists because we saw a gap: world-class AI models remain out of reach for most organizations due to cost and infrastructure demands. The challenges were immense, from hardware limitations to model optimization, from ensuring no accuracy loss to creating an enterprise-ready deployment pathway. We overcame them through relentless experimentation, leveraging open-source innovations, and our collaboration with ITEL for hardware sponsorship. The vision ahead is to democratize advanced reasoning models, enabling industries—from finance to healthcare—to adopt AI seamlessly and affordably.

6. Give examples of how people are using it to set credibility

While there are multiple quants and finetunes of our model produced by the community on HuggingFace giving us more than 10k downloads within a week, our actual validation is labs on Reddit showcasing their one shot projects coded with our model locally. Link: https://www.reddit.com/r/LocalLLaMA/comments/1p02zed/built_using_local_miniagent_with_minimaxm2thrift/

7. Why are you proud to make this first-in-world out from India?

It's a milestone for Indian AI. Most breakthroughs in reasoning-scale models come from Silicon Valley or China; creating a world-first model of this scale in India gives other labs in India to invest with GPUs and efforts, edging closer towards building our own foundational models. It validates our ecosystem's capability to innovate, execute, and impact industries globally.

8. Partnership with ITEL? How is it helping?

ITEL has been pivotal. They sponsored the high-performance hardware needed to train, prune, and evaluate our model. Without their support, achieving THRIFT pruning while maintaining accuracy at scale would have been nearly impossible. Beyond hardware, ITEL has provided mentorship and strategic guidance, helping us accelerate deployment and enterprise adoption.

9. A profile of yours

I'm Arjun, co-founder of VibeStudio, an AI venture focused on building practical, high-performance AI solutions from India. I failed by first startup, grew my second from 3 people all the way to a 500+ team and raised $12M from Kalaari. We did significant things in Web3 for India. As my passion was always towards AI, I exited that firm and founded VibeStudio to produce the best picks and shovels in this exponentially growing AI race.

10. Who is your target audience? Investors? Industry? What kind of industry? Any specific message for them?

Our target audience is primarily students and freelancer because of the cost. On the other hand we also serve enterprises and industrial users seeking sovereign high-performance AI. We're also engaging investors interested in supporting scalable, India-born AI innovation. Industries like finance, healthcare, and research are already seeing tremendous value. Our message is simple: this is the first Indian AI model of global significance that you can deploy on-premise with world-class performance and at cost-effective scale.