The Right GPU Model for Every Workload

Comparison of the NVIDIA® Hopper™ Architecture with the Ampere™ and Ada-Lovelace™ Architecture

NVIDIA’s GPU models all offer powerful options. The right choice depends heavily on the specific workload requirements of the project. 


The Models in Comparison

NVIDIA H100 NVL & H100 HGX (Hopper-Architektur) 

For the inference of large language models with up to 175B parameters, NVIDIA offers the H100 NVL GPU, an advanced PCIe-based H100 GPU with NVLink Bridge. The H100 NVL is optimized for AI testing, training and inference and especially for deep learning tasks and large language models.

To efficiently handle highly complex tasks, NVIDIA HGX H100 combines eight H100 GPUs in the form of integrated baseboards. The eight GPU HGX H100 provides fully networked point-to-point NVLink connections between the GPUs. By leveraging the power of the H100 multi-precision tensor cores, an 8x HGX H100 provides over 32 petaflops of FP8 deep learning computing power.

Recommended workloads :

  • NVIDIA H100 NVL 
    • Models up to 175B parameter 
    • Inference 
    • Data analysis 
  • NVIDIA H100 HGX 
    • Models more than 175B Parameter 
    • Inference 
    • High Performance Computing 
    • Deep Learning Training 
Grafik mit 8 Säulen

The NVIDIA H100 NVL delivers higher performance than the H100 PCIe – Source NVIDIA


NVIDIA A100 PCIe (Ampere Architecture) 

The NVIDIA A100 Tensor Core GPU is designed for compute-intensive AI, HPC and data analytics applications. It offers accelerated performance for AI-driven tasks. It is particularly suitable for environments where multiple applications need to run simultaneously.

Use Cases

  • Training
  • Inference 
  • Data analysis 
DIAGRAMM “AI Training”

The NVIDIA H100 Tensor Core GPU in comparison with the NVIDIA A100 Tensor Core GPU – Source “NVIDIA H100 Datasheet”


NVIDIA L40S (Ada-Lovelace-Architecture) 

The NVIDIA L40S GPU, based on the Ada Lovelace architecture, is the most powerful general-purpose GPU for data centers, delivering breakthrough multi-workload acceleration for large language models (LLM), inference and training, graphics and video applications. As the leading platform for multimodal generative AI, the L40S GPU provides end-to-end acceleration for inference, training, graphics and video workflows to support the next generation of AI-enabled audio, speech, 2D, video and 3D applications.

Use Cases

  • Generative AI 
  • Training 
  • Learning 
  • Inference 
  • Rendering and 3D graphics 

Technical Data at a Glance

HGX H100H100 NVLA100L40S
STACKIT Machine Types n3.104d.g8 
 
Machine type with 8x HGX H100 GPUs 
n3.14d.g1 
n3.28d.g2 
n3.56d.g4 
 
Machine types with 1 up to 4  
H100 NVL GPUs 
n1.14d.g1 
n1.28d.g2 
n1.56d.g4 
 
Machine types with 1 up to 4 A100 PCIe GPUs. 
n2.14d.g1 
n2.28d.g2 
n2.56d.g4 
 
Machine types with 1 up to 4 L40s GPUs 
FP64 TC | FP32 TFLOPS167 | 67 60 | 6019.5 | 19.5 NA | 91.6 
TF32 TC | FP16 TC TFLOPS989 | 1979 835 | 1671312 | 624 366 | 733 
FP8 TC | INT8 TC TFLOPS/TOPS3958 | 3958 3341 | 3341NA | 1248 1466 | 1466 
GPU Memory 80GB HBM3 94GB HBM3 80GB HBM2e 48GB GDDR6 
Media Acceleration 7 JPEG Decoder 
7 Video Decoder 
7 JPEG Decoder 
7 Video Decoder 
1 JPEG Decoder 
5 Video Decoder 
3 Video Encoder 
3 Video Decoder 
4 JPEG Decoder 
1 All Tensor Core numbers with sparsity. Without sparsity is ½ the value. 

Source: NVIDIA


STACKIT Support Headset

Please contact us for your

individual consulting

To the contact form