Hugging Face: How Enterprises Can Reduce AI Costs While Maintaining Performance

Many companies operate under the assumption that AI development demands massive computational power, leading them to prioritize simply acquiring more resources.
However, Sasha Luccioni, AI and Climate Lead at Hugging Face, suggests a different path. What if the focus shifted to using AI more intelligently? Instead of relentlessly pursuing additional (and often excessive) compute capacity, companies could enhance model performance and precision.
Luccioni argues the core issue lies in the approach: businesses should aim for smarter computation, not just more of it.
"We're overlooking more intelligent methods because we're fixated on needing more FLOPS, more GPUs, and more time," she explained.
Here are five key strategies from Hugging Face to help businesses of all sizes deploy AI more efficiently.
1: Select the Right Model for the Task
Resist the urge to default to massive, general-purpose models for every application. Specialized or distilled models can often achieve equivalent, or even superior, accuracy for specific tasks—at a significantly lower cost and with reduced energy consumption.
Luccioni's research indicates that a task-specific model can consume 20 to 30 times less energy than a general-purpose one. "These models are built for a single purpose, unlike large language models designed to handle any query," she stated.
Model distillation is crucial here. A full-scale model can be initially trained and subsequently refined for a particular function. For example, Luccioni pointed out that DeepSeek R1 is so large that most organizations can't afford to run it, often requiring at least 8 GPUs. In contrast, distilled versions can be 10 to 30 times smaller and operate on a single GPU.
She also highlighted the efficiency benefits of open-source models, which eliminate the need for training from scratch. Unlike a few years ago, when companies wasted resources searching for suitable models, they can now start with a base model and fine-tune it for their needs.
"This fosters collaborative, incremental innovation instead of isolated efforts where everyone trains their own models, effectively wasting computational resources," Luccioni said.
There is a growing realization that the costs of generative AI often outweigh its benefits, leading to corporate disillusionment. While generic uses like email composition or meeting transcription are genuinely helpful, task-specific models still demand considerable effort. Off-the-shelf models are often insufficient and more expensive, according to Luccioni.
Bridging this gap represents the next frontier for added value. "Most companies want a specific task accomplished," Luccioni noted. "They aren't seeking artificial general intelligence (AGI); they want specialized intelligence. That's the challenge we need to address."
2. Make Efficiency the Default
Integrate principles from "nudge theory" into system design, set conservative computational budgets, limit always-on generative features, and require users to opt-in for high-cost compute modes.
In behavioral science, "nudge theory" involves subtly guiding choices to encourage positive behaviors. Luccioni cited the classic example of offering cutlery with takeout meals: making utensils an opt-in choice, rather than including them by default, can drastically reduce waste.
"Simply shifting from an opt-out to an opt-in model can powerfully influence user behavior," Luccioni explained.
Default settings often lead to unnecessary usage and increased costs, as models perform tasks they weren't required to do. For instance, some search engines now automatically generate AI summaries at the top of results. Luccioni also observed that when using OpenAI's GPT-5 recently, the model defaulted to full reasoning mode for even very simple queries.
"For me, that should be the exception," she said. "If I ask, 'What is the meaning of life?' then sure, an AI summary might be useful. But for questions like 'What's the weather in Montreal?' or 'What are my local pharmacy's hours?' I don't need a generative summary. The default should be no reasoning."
3. Optimize Hardware Utilization
Implement batching, adjust numerical precision, and fine-tune batch sizes for the specific hardware generation to minimize wasted memory and power consumption.
Companies should evaluate their specific needs: Does the model need to run continuously? Will it face real-time requests, perhaps 100 at once? In such cases, always-on optimization is essential, Luccioni noted. However, in many other scenarios, it isn't; models can be run periodically to conserve memory, and batching can optimize memory use.
"It's an engineering challenge, but a very specific one, so it's difficult to give blanket advice like 'distill all models' or 'change the precision on everything,'" said Luccioni.
In a recent study, she discovered that the ideal batch size is highly dependent on the hardware, down to the specific model or version. Increasing the batch size by just one unit can sometimes raise energy usage because the model requires more memory resources.
"This is an aspect people often overlook. They think, 'I'll just maximize the batch size,' but true efficiency comes from meticulously adjusting all these variables. The result is a highly optimized system, but one that is tailored to a very specific context," Luccioni explained.
4. Incentivize Energy Transparency
Incentives drive change. With this in mind, Hugging Face launched the AI Energy Score earlier this year. This initiative promotes energy efficiency using a 1- to 5-star rating system, where the most efficient models earn a "five-star" designation.
It can be thought of as an "Energy Star for AI," inspired by the longstanding federal program that set efficiency standards and labeled qualifying appliances with its logo.
"For decades, that star rating was a powerful motivator. People wanted it," said Luccioni. "Achieving a similar impact with the Energy Score would be fantastic."
Hugging Face has established a public leaderboard, which it plans to update with new models like DeepSeek and GPT-oss in September, and continue refreshing every six months or as new models emerge. The aim is for model developers to view a high rating as a "badge of honor," Luccioni remarked.
5. Rethink the "More Compute is Better" Mindset
Instead of pursuing the largest GPU clusters, start by asking: "What is the most intelligent way to achieve the desired outcome?" For numerous applications, smarter architectures and better-curated datasets yield better results than brute-force scaling.
"I believe most people probably don't need as many GPUs as they think," Luccioni stated. She encouraged businesses to reconsider the actual tasks their GPUs will handle, why they are necessary, how such tasks were performed previously, and what tangible benefits additional GPUs will actually deliver.
"It's become a race to the bottom, where everyone feels they need a bigger cluster," she said. "The key is to analyze what you're using AI for, what specific techniques are required, and what those techniques truly demand."
Related article
Amazon Commits $100B to AI in 2025
Despite the recent buzz suggesting that DeepSeek would usher in an era of reduced AI budgets, there's no indication that Big Tech is hitting the brakes. On the contrary, they're stepping on the gas.
Amazon is the latest giant to reveal a hefty AI spending plan, forecasting over $100 billion in capi
Nvidia Unveils Next-Gen GPUs: Blackwell Ultra, Vera Rubin, Feynman
At Nvidia's GTC 2025 conference in San Jose on Tuesday, CEO Jensen Huang unveiled a lineup of upcoming GPUs set to hit the market in the coming months. The star of the show? The Vera Rubin GPU, slated for a second-half 2026 release. This beast boasts tens of gigabytes of memory and comes with a cust
Zhiyuan WITA Ends 'Naked' Robot Interaction with First Compliance Filing
The embodied intelligence sector has reached a significant milestone. According to the latest announcement from the Shanghai Cyberspace Administration, the WITA large model developed by Zhiyuan has successfully completed the filing process, becoming
Related Special Topic Recommendations
Comments (0)
0/500

Many companies operate under the assumption that AI development demands massive computational power, leading them to prioritize simply acquiring more resources.
However, Sasha Luccioni, AI and Climate Lead at Hugging Face, suggests a different path. What if the focus shifted to using AI more intelligently? Instead of relentlessly pursuing additional (and often excessive) compute capacity, companies could enhance model performance and precision.
Luccioni argues the core issue lies in the approach: businesses should aim for smarter computation, not just more of it.
"We're overlooking more intelligent methods because we're fixated on needing more FLOPS, more GPUs, and more time," she explained.
Here are five key strategies from Hugging Face to help businesses of all sizes deploy AI more efficiently.
1: Select the Right Model for the Task
Resist the urge to default to massive, general-purpose models for every application. Specialized or distilled models can often achieve equivalent, or even superior, accuracy for specific tasks—at a significantly lower cost and with reduced energy consumption.
Luccioni's research indicates that a task-specific model can consume 20 to 30 times less energy than a general-purpose one. "These models are built for a single purpose, unlike large language models designed to handle any query," she stated.
Model distillation is crucial here. A full-scale model can be initially trained and subsequently refined for a particular function. For example, Luccioni pointed out that DeepSeek R1 is so large that most organizations can't afford to run it, often requiring at least 8 GPUs. In contrast, distilled versions can be 10 to 30 times smaller and operate on a single GPU.
She also highlighted the efficiency benefits of open-source models, which eliminate the need for training from scratch. Unlike a few years ago, when companies wasted resources searching for suitable models, they can now start with a base model and fine-tune it for their needs.
"This fosters collaborative, incremental innovation instead of isolated efforts where everyone trains their own models, effectively wasting computational resources," Luccioni said.
There is a growing realization that the costs of generative AI often outweigh its benefits, leading to corporate disillusionment. While generic uses like email composition or meeting transcription are genuinely helpful, task-specific models still demand considerable effort. Off-the-shelf models are often insufficient and more expensive, according to Luccioni.
Bridging this gap represents the next frontier for added value. "Most companies want a specific task accomplished," Luccioni noted. "They aren't seeking artificial general intelligence (AGI); they want specialized intelligence. That's the challenge we need to address."
2. Make Efficiency the Default
Integrate principles from "nudge theory" into system design, set conservative computational budgets, limit always-on generative features, and require users to opt-in for high-cost compute modes.
In behavioral science, "nudge theory" involves subtly guiding choices to encourage positive behaviors. Luccioni cited the classic example of offering cutlery with takeout meals: making utensils an opt-in choice, rather than including them by default, can drastically reduce waste.
"Simply shifting from an opt-out to an opt-in model can powerfully influence user behavior," Luccioni explained.
Default settings often lead to unnecessary usage and increased costs, as models perform tasks they weren't required to do. For instance, some search engines now automatically generate AI summaries at the top of results. Luccioni also observed that when using OpenAI's GPT-5 recently, the model defaulted to full reasoning mode for even very simple queries.
"For me, that should be the exception," she said. "If I ask, 'What is the meaning of life?' then sure, an AI summary might be useful. But for questions like 'What's the weather in Montreal?' or 'What are my local pharmacy's hours?' I don't need a generative summary. The default should be no reasoning."
3. Optimize Hardware Utilization
Implement batching, adjust numerical precision, and fine-tune batch sizes for the specific hardware generation to minimize wasted memory and power consumption.
Companies should evaluate their specific needs: Does the model need to run continuously? Will it face real-time requests, perhaps 100 at once? In such cases, always-on optimization is essential, Luccioni noted. However, in many other scenarios, it isn't; models can be run periodically to conserve memory, and batching can optimize memory use.
"It's an engineering challenge, but a very specific one, so it's difficult to give blanket advice like 'distill all models' or 'change the precision on everything,'" said Luccioni.
In a recent study, she discovered that the ideal batch size is highly dependent on the hardware, down to the specific model or version. Increasing the batch size by just one unit can sometimes raise energy usage because the model requires more memory resources.
"This is an aspect people often overlook. They think, 'I'll just maximize the batch size,' but true efficiency comes from meticulously adjusting all these variables. The result is a highly optimized system, but one that is tailored to a very specific context," Luccioni explained.
4. Incentivize Energy Transparency
Incentives drive change. With this in mind, Hugging Face launched the AI Energy Score earlier this year. This initiative promotes energy efficiency using a 1- to 5-star rating system, where the most efficient models earn a "five-star" designation.
It can be thought of as an "Energy Star for AI," inspired by the longstanding federal program that set efficiency standards and labeled qualifying appliances with its logo.
"For decades, that star rating was a powerful motivator. People wanted it," said Luccioni. "Achieving a similar impact with the Energy Score would be fantastic."
Hugging Face has established a public leaderboard, which it plans to update with new models like DeepSeek and GPT-oss in September, and continue refreshing every six months or as new models emerge. The aim is for model developers to view a high rating as a "badge of honor," Luccioni remarked.
5. Rethink the "More Compute is Better" Mindset
Instead of pursuing the largest GPU clusters, start by asking: "What is the most intelligent way to achieve the desired outcome?" For numerous applications, smarter architectures and better-curated datasets yield better results than brute-force scaling.
"I believe most people probably don't need as many GPUs as they think," Luccioni stated. She encouraged businesses to reconsider the actual tasks their GPUs will handle, why they are necessary, how such tasks were performed previously, and what tangible benefits additional GPUs will actually deliver.
"It's become a race to the bottom, where everyone feels they need a bigger cluster," she said. "The key is to analyze what you're using AI for, what specific techniques are required, and what those techniques truly demand."
Amazon Commits $100B to AI in 2025
Despite the recent buzz suggesting that DeepSeek would usher in an era of reduced AI budgets, there's no indication that Big Tech is hitting the brakes. On the contrary, they're stepping on the gas.
Amazon is the latest giant to reveal a hefty AI spending plan, forecasting over $100 billion in capi
Nvidia Unveils Next-Gen GPUs: Blackwell Ultra, Vera Rubin, Feynman
At Nvidia's GTC 2025 conference in San Jose on Tuesday, CEO Jensen Huang unveiled a lineup of upcoming GPUs set to hit the market in the coming months. The star of the show? The Vera Rubin GPU, slated for a second-half 2026 release. This beast boasts tens of gigabytes of memory and comes with a cust
Zhiyuan WITA Ends 'Naked' Robot Interaction with First Compliance Filing
The embodied intelligence sector has reached a significant milestone. According to the latest announcement from the Shanghai Cyberspace Administration, the WITA large model developed by Zhiyuan has successfully completed the filing process, becoming





Home






