September 1 was a big day! The official cross-vendor Vulkan mesh shading extension that I teased a while ago, has now been officially released. This is a significant moment for me because I’ve spent considerable time making the RADV implementation and collaborated with some excellent people to help shape this extension in Khronos.

How it started

We first started talking about mesh shaders in Mesa about two years (maybe more?) ago. At the time the only publicly available Vulkan mesh shading API was the vendor-specific NV_mesh_shader made by NVidia.

  • At the time nobody quite fully understood what mesh shaders are supposed to be and how they could work on the HW. I initially anticipated that they could be made to work even on RDNA1, but this turned out false due to some HW limitations.
  • It was unclear what was needed to get good performance.
  • The NVidia extension was a good start, but it was full of things that don’t make any real sense on any other HW.
  • Nobody had a clue how we would implement task shaders.

I was working together with Christoph Kubisch (from NVidia) who helped understand what this topic is all about. Caio Oliveira and Marcin Slusarz (from Intel, working on ANV) joined the adventure too.

How we made it work

We made the decision to start working on some preliminary support for the NV extension to get some experience with the topic and learn how it’s supposed to work. Once we were satisfied with that, we made the jump to the EXT.

NIR and SPIR-V

These are the common Mesa pieces that all drivers can use. The front-end and middleware code for mesh/task shader support are here. Caio created the initial pieces and I expanded on that as I found more and more things that needed adjustment in NIR, added a new storage class for task payloads etc. Marcin also chimed in with fixes and improvements.

AMD and Intel hardware work differently, so most of the hard work couldn’t be shared and needed to be implemented in the backends. However, some of the commonalities are implemented in eg. nir_lower_task_shader for the work that needs to happen in both RADV and ANV. There were dozens of merge requests that added support for various features, cleaned up old features to make them not crash on mesh shaders, etc. The latest is this MR which adds all the remaining puzzle pieces for the EXT.

Lowering the shaders in the backend

Because mesh shaders use NGG, the heavy lifting is done in ac_nir_lower_ngg which is already responsible for other NGG shaders (VS, TES, GS). The lowering basically “translates” the API shader to work like the HW expects. Without going into much detail, it essentially wraps the application shader and replaces some pieces of it to make them understandable by the HW.

There is now also an ac_nir_lower_taskmesh_io_to_mem for translating task payload I/O and EmitMeshTasksEXT to something the HW can understand.

Mesh shading draw calls in RADV

Previously I was mostly working on the compiler stack (NIR and ACO) and had little experience with the hardcore driver code, such as how draws/dispatches work and how the driver submits cmd buffers to the GPU. As I had near-zero knowledge of these areas, I had to learn about them, mostly by just reading the code.

So, I split the RADV work in two parts:

  1. Mesh shader only pipelines. These required only moderate changes to radv_cmd_buffer to add some new draw calls, and minor work in radv_pipeline to figure out per-primitive output attributes and get the register programming right.
  2. Task shader support. Due to how task shaders work on AMD HW, these required very severe refactoring in radv_device because we now have to submit to multiple queues at once. This is actually not finished yet because “gang submit” is still missing. Additionally, radv_cmd_buffer needed heavy changes to support an internal compute cmdbuf.

Naturally, during this work I also hit several RADV bugs in pre-existing use cases which just nobody noticed yet. Some of these were issues with secondary command buffers and conditional rendering on compute-only queues. There was also a nasty firmware bug, and other exciting stuff.

Let’s just say that my GPU loved to hang out with me.

Mesh shading on Intel

The Intel ANV compiler backend and driver implementation were done by Caio and Marcin and I just want to take this opportunity to say that I really enjoyed working together with them.

The guy who wrote the most mesh shaders on Earth

All of the above would have been impossible if I didn’t have some solid test cases which I could throw at my implementation. Ricardo Garcia developed the CTS (Vulkan Conformance Test Suite) testcases for both NV_mesh_shader and EXT_mesh_shader. During that work, Ricardo wrote several thousand mesh and task shaders for us to test with. Ricardo if you’re reading this, THANK YOU!!!

Conclusion

Implementing VK_EXT_mesh_shader gave me a very good learning experience and helped me get an understanding of parts of the driver that I had never looked at before.

What happens to NV_mesh_shader now?

We never wanted to officially support it, it was just a stepping stone for us to help start working on the EXT. The NV extension support will be removed soon.

When is it coming to my Steam Deck / Linux computer?

The RADV and ANV support will be included once the system is updated to Mesa 22.3, though we may be convinced to bring it to the Deck sooner if somebody finds a game that uses mesh shaders.

For NVidia proprietary driver users, EXT_mesh_shader is already included in the latest beta drivers.

Waiting for the gang…

We marked mesh/task shader support “experimental” in RADV because it has one main caveat that we are unable to solve without kernel support. Thanks to how we need to submit to two different queues at the same time, this can deadlock your GPU if you are running two (or more) processes which use task shaders at the same time. To properly solve it we need the “gang submit” feature in the kernel which prevents such deadlocks.

Unfortunately “gang submit” is not upstream yet. Cross your fingers and let’s hope it’ll be included in Linux 6.1.

Until then, you can use the RADV_PERFTEST=ext_ms environment variable to play your favourite mesh shader games!