Optimizing Git For Ryzen CPUs (1.5x Faster)
I still remember driving two hours away to pick up the only Ryzen 3900X in stock “near by”. The excitement of AMD finally breaking Intel’s monopoly on high end CPUs was contagious. Since then it’s handled pretty much everything, I’ve thrown at it, but I can’t help but feel most software I run has still been optimized for Intel only. These CPUs perform extremely well, but how much better could they be if software was optimized specifically for them?
One unique aspect of the Ryzen family is its currently the only high-end CPU to support Intel’s SHA-NI instruction set. First released in 2013, Intel has only seen fit to implement this on its very low-end CPUs. As a result, most software (including openssl on my Ubuntu 18.04 box) does not bother to use these instructions. I knew it was much faster than software but I wanted to see how that would play out in practice, so I fired up my assembler and got to work.
My hypothesis was simple: Perhaps these instructions could make SHA fast enough to use as a generic hash function for hash tables. A hash table is the core data structure for KeyDB and Redis, so if we saw improvements there it would be a big win. Unfortunately, while these instructions are much faster than software SHA-1 implementations, they are still much slower than hash table optimized functions like siphash.
With working code in hand, I now had a solution in search of problem. The outlet appeared close by with git and its controversial use of SHA-1. If these instructions wouldn’t be worthwhile for hash table use perhaps, they could still improve Git. After a few hours of debugging I finally had a working git running my Ryzen optimized SHA-1 code. The results showed substantial real world improvements for some commands - especially if you work with large files.
As you would expect operations that rely heavily on computing hashes are sped up substantially. This is especially noticeable if you are checking in large files where I saw a speedup of 1.5x. The next biggest improvement was “git fsck” which was 1.4x faster when run on the Linux repo. Git checkout was improved slightly running about 1.06x faster within the Linux repo. I did not see any improvements with clone and merge.
Not tested was commits with lots of small files and non-trivial merges; However, I would expect to see improvements there as well provided you are not bottlenecked by disk IO. My code is the reuslt of a few hours of hacking, and the results should get even better with more time and polish.
For some users these improvements will be a real time saver, while others will see only marginal gains. But, if a few hours of hacking can improve such massively deployed software then its clear most software is still optimized for Intel only. I'm hoping we will start to see more software add Zen optimized code paths - especially as EPYC processors get wider adoption in the data center.
You can find my modified Git here: https://github.com/JohnSully/git. Note that these changes are only a prototype for demonstration purposes, I wouldn't recommend using this in production just yet.