Massachusetts

NVIDIA NIM now available on Hugging Face with Inference-as-a-Service

Idris Bevan August 12, 2024

Hugging Face has announced the launch of an inference-as-a-service capability powered by NVIDIA NIM. This new service provides developers easy access to NVIDIA-accelerated inference for popular AI models.

The new service will enable developers to quickly deploy leading large-scale language models such as the Llama 3 family and Mistral AI models optimized by NVIDIA NIM microservices running on NVIDIA DGX Cloud. This will help developers quickly prototype and deploy open source AI models hosted on the Hugging Face Hub into production.

Hugging Face inference as a service on NVIDIA DGX Cloud with NIM microservices provides easy access to compute resources optimized for AI use. The NVIDIA DGX Cloud platform is purpose-built for generative AI and provides scalable GPU resources to support every step of AI development from prototype to production.

To use the service, users must have access to an Enterprise Hub organization and a fine-grained token for authentication. The NVIDIA NIM endpoints for supported generative AI models can be found on the Hugging Face Hub models page.

Currently, the service only supports the chat.completions.create And models.list APIs, but Hugging Face is working to extend this and add more models. Hugging Face Inference-as-a-Service usage on the DGX Cloud is billed based on compute time spent per request using NVIDIA H100 Tensor Core GPUs.

Hugging Face is also collaborating with NVIDIA to integrate the NVIDIA TensorRT-LLM library into Hugging Face’s Text Generation Inference (TGI) framework, improving the performance and accessibility of AI inference. In addition to the new Inference-as-a-Service, Hugging Face also offers Train on DGX Cloud, an AI training service.

Clem Delangue, CEO of Hugging Face, posted on his X account:

I am very excited that Hugging Face is becoming the gateway for AI computing!

And Kaggle Master Rohan Paul shared a post on X saying:

So we can use open models with NVIDIA DGX Cloud accelerated compute platform for inference deployment. The code is fully compatible with OpenAI API, so you can use OpenAI’s SDK for inference.

At SIGGRAPH, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework to enable developers to more quickly create high-fidelity virtual worlds for the next evolution of AI.

Yiamastaverna

Yiamastaverna

NVIDIA NIM now available on Hugging Face with Inference-as-a-Service

LEAVE A RESPONSE Cancel reply

Idris Bevan

9.9 million pounds of ready-to-eat meat and poultry recalled due to Listeria concerns

Be wary of scam calls claiming to be from the Public Service Commission

Sunday service held after car sped through Mount Pilgrim Baptist Church

US Fish and Wildlife Service proposes classifying small desert fish as endangered species

ads

Recent Posts

Recent Comments

Archives

Categories

NVIDIA NIM now available on Hugging Face with Inference-as-a-Service

LEAVE A RESPONSE Cancel reply

Idris Bevan

You Might Also Like

9.9 million pounds of ready-to-eat meat and poultry recalled due to Listeria concerns

Be wary of scam calls claiming to be from the Public Service Commission

Sunday service held after car sped through Mount Pilgrim Baptist Church

US Fish and Wildlife Service proposes classifying small desert fish as endangered species

ads

Recent Posts

Recent Comments

Archives

Categories