Monday, September 11, 2023

Dear Elon, Right Back Atcha: "Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU" (NVDA; TSLA)

The need for speed.

From The Channel, September 8:

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia plans to integrate the software, which is available in early access, into its Nvidia NeMo framework as part of the Nvidia AI Enterprise software suite.

Nvidia said it plans to release new open-source software that will significantly speed up live applications running on large language models powered by its GPUs, including the flagship H100 accelerator.

The Santa Clara, Calif.-based AI chip giant said on Friday that the software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models (LLMs) when it comes out next month. Nvidia plans to integrate the software, which is available in early access, into its Nvidia NeMo LLM framework as part of the Nvidia AI Enterprise software suite.

The chip designer announced TensorRT-LLM as Nvidia seeks to maintain its dominance of the fast-growing AI computing market, which allowed it to double revenue over a year in the last financial quarter.

“We’ve doubled the performance by using the latest techniques, the latest schedulers and incorporating the latest optimizations and kernels,” said Ian Buck, vice president of hyperscale and high-performance computing at Nvidia, in a briefing with journalists. “Those techniques improve performance, not just by increasing efficiency but also optimizing the algorithm end-to-end.”....

....MUCH MORE

Earlier today: