Inference Engine in Ai

1mon

DigitalOcean Launches Inference Engine with New Capabilities for Production AI, Including Inference Router for Efficient Scaling of Agentic Workloads

DigitalOcean (NYSE: DOCN) today announced the launch of its Inference Engine, a set of new production capabilities that give AI builders exceptional performance and unified control over how they run, ...

GF sees on-chip memory a niche AI inference trend; neutral on Cerebras but bullish on EDA, foundries

GF Securities (Hong Kong) sees on-chip memory as a niche AI inference trend but takes a neutral stance towards AI chipmaker ...

Google Kubernetes Engine (GKE) boosted AI inferencing compared to Amazon EKS

Principled Technologies found GKE with GKE Inference Gateway delivered 15.7% higher token throughput, 92.8% lower ...

Amazon plans to make SpaceX's Grok models available on its flagship AI service

Amazon Web Services is in talks to add Grok models to AWS's Bedrock AI platform, expanding its AI offerings and reach.

Cloud AI Inference Workload Capacity Consumption to Surpass Training by 2033, Reaching 46 GW by 2035

Global technology intelligence firm ABI Research forecasts that AI inference workloads will grow at a 42% CAGR to surpass 46 Gigawatts of capacity consumption by 2035, overtaking training workloads by ...

Democratizing AI adoption with Tether’s Bitnet LLM fine-tuning framework

While tech giants lock smaller businesses out of advanced AI, Tether is using localized fine-tuning and P2P networks to democratize superintelligence for billions of people.

CRN

AWS Trainium3 AI Is ‘The Best Inference Platform In The World,’ CEO Says

AWS CEO Matt Garman talks to CRN about its new Trainium3 AI accelerator chips being the ‘best inference platform in the world,’ AI openness being a market differentiator versus competitors, and ...

InfoWorld

Evolving Kubernetes for generative AI inference

Kubernetes has become the leading platform for deploying cloud-native applications and microservices, backed by an extensive community and comprehensive feature set for managing distributed systems.

11d

How Cactus Engine Runs Powerful Local AI Models on 10X Less RAM

The new Cactus AI inference engine allows mobile devices to run local models using 10x less RAM through NPU optimization and ...

Kneron to Unveil the Future of Edge AI at COMPUTEX 2026

From AI PCs to enterprise AI infrastructure, Kneron showcases why the next era of artificial intelligence will run at ...

VentureBeat

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results