close
close

Yiamastaverna

Trusted News & Timely Insights

NVIDIA NIM now available on Hugging Face with Inference-as-a-Service
Massachusetts

NVIDIA NIM now available on Hugging Face with Inference-as-a-Service

Hugging Face has announced the launch of an inference-as-a-service capability powered by NVIDIA NIM. This new service provides developers easy access to NVIDIA-accelerated inference for popular AI models.

The new service will enable developers to quickly deploy leading large-scale language models such as the Llama 3 family and Mistral AI models optimized by NVIDIA NIM microservices running on NVIDIA DGX Cloud. This will help developers quickly prototype and deploy open source AI models hosted on the Hugging Face Hub into production.

Hugging Face inference as a service on NVIDIA DGX Cloud with NIM microservices provides easy access to compute resources optimized for AI use. The NVIDIA DGX Cloud platform is purpose-built for generative AI and provides scalable GPU resources to support every step of AI development from prototype to production.

To use the service, users must have access to an Enterprise Hub organization and a fine-grained token for authentication. The NVIDIA NIM endpoints for supported generative AI models can be found on the Hugging Face Hub models page.

Currently, the service only supports the chat.completions.create And models.list APIs, but Hugging Face is working to extend this and add more models. Hugging Face Inference-as-a-Service usage on the DGX Cloud is billed based on compute time spent per request using NVIDIA H100 Tensor Core GPUs.

Hugging Face is also collaborating with NVIDIA to integrate the NVIDIA TensorRT-LLM library into Hugging Face’s Text Generation Inference (TGI) framework, improving the performance and accessibility of AI inference. In addition to the new Inference-as-a-Service, Hugging Face also offers Train on DGX Cloud, an AI training service.

Clem Delangue, CEO of Hugging Face, posted on his X account:

I am very excited that Hugging Face is becoming the gateway for AI computing!

And Kaggle Master Rohan Paul shared a post on X saying:

So we can use open models with NVIDIA DGX Cloud accelerated compute platform for inference deployment. The code is fully compatible with OpenAI API, so you can use OpenAI’s SDK for inference.

At SIGGRAPH, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework to enable developers to more quickly create high-fidelity virtual worlds for the next evolution of AI.

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *