Apple’s recently announced M3 and A17 Pro chips feature several significant improvements to parallel processing, which results in major performance gains for apps and games that utilize the Metal API.

Apple M3 and A17 Pro GPUs parallel processing performance

In a developer talk (video), Apple explained that these new GPUs utilize Dynamic Caching, hardware-accelerated ray tracing, and hardware-accelerated mesh mapping to achieve improved performance.

Dynamic Caching allows the GPU to allocate exactly the right amount of register memory for every action it is taking. This frees up previously unavailable register memory, allowing for many more shader tasks to occur in parallel.

Flexible on-chip memory allows the GPU to utilize all of its on-chip memory for any memory type, rather than having fixed memory allocations for register, thread group, and tile memory. This means that actions that heavily rely on one type of memory can utilize the entire span of the on-chip memory, and even overflow actions into main memory.

Hardware-accelerated ray tracing and hardware-accelerated mesh shading take portions of the calculations for these tasks out of the GPU function, passing them to dedicated units. This allows for more parallel operations to occur, speeding up these tasks significantly.

What does this mean for developers?

Developers don’t need to make any changes to their apps to see performance improvements with M3 and A17 Pro. However, there are a few things they can do to maximize the benefits of these new features:

  • Execute FP16 math in their programs, as the high-performance ALUs execute different combinations of integer, FP32, and FP16 in parallel.
  • Use flexible on-chip memory to allocate the right amount of memory to each action.
  • Design their apps to take advantage of hardware-accelerated ray tracing and mesh shading.

Apple went into more detail about their new silicon GPUs in their video presentation. You can watch it to learn more.

Related:

(Source)