-
-
Notifications
You must be signed in to change notification settings - Fork 34.2k
Description
TOS caching was introduced by #135379, but it only supports a fixed number of registers.
We should make it configurable, so that we can choose the optimum number of registers for any hardware/OS combination.
Probably the simplest way to do this is to generate executor cases and stencils code for up to the overall maximum number of registers, and guard each case/function with an #if MAX_CACHED_REGISTER > N where N is the number of registers needed for that variant.
Some cases/functions will also need to guarded for when MAX_CACHED_REGISTER is too low.
For example the ideal number of output registers for _BINARY_OP variants is 3.
For machines where MAX_CACHED_REGISTER < 3, we will need to generate variants with outputs < 3, but we want to exclude those variants for machines with MAX_CACHED_REGISTER >= 3
Overall this could result in generating a lot more code, probably more than double, but if MAX_CACHED_REGISTER is unchanged then the executable size should also be unchanged.
If we increase the MAX_CACHED_REGISTER to 4 or 5 we would expect the stencils and supporting tables to increase by ~35% and ~80% respectively (growth being a bit more than linear as some instructions have N**2 variants)