Finding the “Do More with Less” Equilibrium: Lessons from Right-Sizing Our AI

2 July 2026

In 2026, agility has quietly become the most durable competitive advantage a company can hold, and the real return on specialized technology is increasingly measured in human capital optimization rather than headcount reduction. The largest enterprises are learning that scale cuts both ways: while AI rollouts now arrive with staggering cloud bills and the HR cost of retraining entire departments on workflows that seem to change every quarter. Smaller firms are nimbler and it is a chance to gain an advantage through precise, surgical automation rather than wholesale transformation.

We learned this through our own operations at GLIMPS, working to streamline the low-value-added parts of the business: corporate documentation, meeting minutes, and technical engineering support. Early on, we used our expertise to make the unconventional choice for our internal workloads, not using general purpose LLM for everything. This anticipated the friction in two places:

The first was financial cost. Processing enterprise volumes of data through a massive external model means feeding extensive internal documentation into the context window on nearly every call. That practice is expensive by design; injecting proprietary material into public prompts can multiply baseline API costs several times over. Budgets that look predictable in a pilot can become very difficult to forecast on a scale.

The second was the HR cost, and it was harder to put on a balance sheet. Usually, engineers spend more time correcting low-confidence summaries, occasional hallucinations, and awkwardly formatted drafts than they would have spent writing from scratch. We were aware of this: recent surveys suggest many executives see only marginal returns, on the order of one to five percent, from broad, generalized deployments. The tool would have been technically working and practically draining.

The choice made had less to do with ambition than with discipline. Rather than reaching for the largest available model, our team used a scoring framework to evaluate smaller, specialized language models (SLMs) against a fixed hardware envelope, what could realistically run, and run well, on a dual-GPU setup we already controlled. By optimizing model and KV-cache quantization, we held a large context window, maintained healthy token throughput, and supported concurrent internal users without renting someone else’s infrastructure by the token.

That constraint shaped a deliberately lean deployment: a pair of models tuned for our technical teams, one for coding assistance, one for agentic workflows, and two more tailored to general business operations. None of them is the most powerful model on the market. All of them are good enough at the narrow task we actually need, and they run on hardware whose cost we can name to the dollar.

The change our teams felt most was the quietest one. Instead of editing clumsy machine prose, people turned to strategic work and core product development, and morale followed.

For executives weighing similar decisions, the lesson is not that large models are bad, they are remarkable instruments. It is that “doing more with less” was never meant to describe replacing talented people with an off-the-shelf chatbot. The more defensible reading, particularly for an SME, is to invest in specialized, sovereign technology that strips away friction, amplifies the potential and capabilities of the team you already have, and keeps infrastructure costs firmly within your own control.

About the Author — Jordan THEODORE: A tech leader and defensive security specialist, I operate at the heart of critical infrastructure protection. My expertise centers on malware reverse engineering, digital forensics, and AI. Driven by a passion for knowledge-sharing and mentorship, I translate this hands-on, field-tested expertise into various podcasts and publications.

–

Subscribe

Stay informed of our latest updates by subscribing to the FACC-NY newsletter!

Subscribe

Follow Us