Skip to content

Restore 'computed goto' code generation for the VM loop on VS2022 #1751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Mooshua
Copy link

@Mooshua Mooshua commented Mar 27, 2025

This patch adds an __assume statement for passed assertations (on release builds) and switches the interpreter loop to using a while-switch instead of a goto-switch loop on MSVC. This seems to prevent the performance regression experienced on 2022, and the end result is two less basic blocks in the dispatch logic.

__assume safety

This patch changes the behavior of assertions on release builds. Any assertion that would throw on a debug build now invokes undefined behavior on a release build. In my testing, this seemed to have a minimal impact on performance on MSVC (within standard error on most benchmarks) so the __assume changes can be reverted if there's a concern.

i9-9900

Compared against 0.666

'luau.exe' change is 7.502% positive on average

Table
Test Min Average StdDev% Driver Speedup Significance P(T<=t)
base64 16.229ms 16.484ms 1.167% luau-release.exe
base64 15.835ms 16.092ms 1.179% luau.exe 2.436% ---
chess 74.762ms 75.700ms 0.697% luau-release.exe
chess 73.578ms 74.799ms 0.987% luau.exe 1.204% ---
life 68.898ms 69.872ms 0.924% luau-release.exe
life 60.905ms 61.606ms 0.633% luau.exe 13.418% ---
matrixmult 17.764ms 18.069ms 0.985% luau-release.exe
matrixmult 16.605ms 17.090ms 1.797% luau.exe 5.727% ---
mesh-normal-scalar 30.737ms 31.789ms 2.564% luau-release.exe
mesh-normal-scalar 27.606ms 28.611ms 1.888% luau.exe 11.108% ---
mesh-normal-vector 12.351ms 13.109ms 4.866% luau-release.exe
mesh-normal-vector 11.253ms 12.060ms 4.206% luau.exe 8.705% ---
pcmmix 6.894ms 7.063ms 1.720% luau-release.exe
pcmmix 6.357ms 6.549ms 2.263% luau.exe 7.849% ---
qsort 67.254ms 68.097ms 0.827% luau-release.exe
qsort 58.912ms 60.077ms 0.990% luau.exe 13.350% ---
sha256 19.253ms 19.994ms 3.049% luau-release.exe
sha256 18.337ms 18.951ms 1.725% luau.exe 5.505% ---
ack 46.928ms 48.590ms 1.971% luau-release.exe
ack 43.711ms 45.121ms 1.844% luau.exe 7.686% ---
binary-trees 25.809ms 26.343ms 1.289% luau-release.exe
binary-trees 24.374ms 24.901ms 1.678% luau.exe 5.791% ---
fannkuchen-redux 10.357ms 10.669ms 2.005% luau-release.exe
fannkuchen-redux 10.111ms 10.440ms 2.285% luau.exe 2.195% ---
fixpoint-fact 44.300ms 45.792ms 1.779% luau-release.exe
fixpoint-fact 42.434ms 43.508ms 1.787% luau.exe 5.250% ---
heapsort 18.824ms 19.281ms 1.596% luau-release.exe
heapsort 18.650ms 19.068ms 1.195% luau.exe 1.117% ---
mandel 55.244ms 56.723ms 1.561% luau-release.exe
mandel 51.297ms 52.298ms 1.260% luau.exe 8.460% ---
n-body 26.489ms 27.260ms 2.002% luau-release.exe
n-body 23.398ms 24.207ms 2.098% luau.exe 12.613% ---
qt 59.413ms 61.168ms 1.246% luau-release.exe
qt 53.196ms 54.604ms 1.778% luau.exe 12.022% ---
queen 1.493ms 1.538ms 3.007% luau-release.exe
queen 1.420ms 1.536ms 4.627% luau.exe 0.136% ---
scimark 64.789ms 66.498ms 1.092% luau-release.exe
scimark 59.858ms 61.812ms 1.618% luau.exe 7.581% ---
spectral-norm 13.689ms 14.147ms 2.235% luau-release.exe
spectral-norm 12.426ms 12.895ms 2.421% luau.exe 9.713% ---
sieve 138.480ms 141.784ms 1.249% luau-release.exe
sieve 137.706ms 139.073ms 0.801% luau.exe 1.949% ---
3d-cube 6.441ms 6.699ms 2.320% luau-release.exe
3d-cube 6.129ms 6.278ms 1.617% luau.exe 6.703% ---
3d-morph 7.363ms 7.558ms 1.079% luau-release.exe
3d-morph 7.121ms 7.281ms 1.454% luau.exe 3.795% ---
3d-raytrace 8.631ms 8.816ms 1.543% luau-release.exe
3d-raytrace 7.917ms 8.147ms 2.175% luau.exe 8.213% ---
controlflow-recursive 4.247ms 4.373ms 2.190% luau-release.exe
controlflow-recursive 3.807ms 3.917ms 2.287% luau.exe 11.650% ---
crypto-aes 12.362ms 12.716ms 1.415% luau-release.exe
crypto-aes 11.389ms 11.704ms 1.783% luau.exe 8.651% ---
fannkuch 20.391ms 20.896ms 1.561% luau-release.exe
fannkuch 18.694ms 19.171ms 1.667% luau.exe 9.001% ---
math-cordic 12.944ms 13.269ms 1.806% luau-release.exe
math-cordic 11.711ms 12.057ms 2.308% luau.exe 10.054% ---
math-partial-sums 4.034ms 4.261ms 2.476% luau-release.exe
math-partial-sums 3.762ms 3.912ms 2.917% luau.exe 8.923% ---
n-body-oop 42.055ms 43.046ms 1.530% luau-release.exe
n-body-oop 35.987ms 36.760ms 1.505% luau.exe 17.098% ---
tictactoe 113.948ms 115.974ms 1.482% luau-release.exe
tictactoe 112.188ms 113.992ms 0.985% luau.exe 1.739% ---
trig 20.189ms 20.918ms 2.265% luau-release.exe
trig 17.406ms 18.051ms 1.857% luau.exe 15.881% ---
vector-math 8.648ms 9.053ms 2.182% luau-release.exe
vector-math 8.608ms 8.946ms 3.139% luau.exe 1.195% ---
voxelgen 64.295ms 67.134ms 2.499% luau-release.exe
voxelgen 59.469ms 60.236ms 0.803% luau.exe 11.452% ---
Total 1145.506ms 1174.685ms --- luau-release.exe
Total 1072.158ms 1095.750ms --- luau.exe 7.204%

i7-1065G7

Compared against 0.666

'luau.exe' change is 4.434% positive on average

Table
Test                   Min         Average     StdDev%     Driver             Speedup     Significance   P(T<=t)
base64                   25.684ms     30.928ms     12.712%   luau-release.exe                                  
base64                   24.485ms     33.039ms     18.392%   luau.exe             -6.389%                     ---
chess                    120.466ms    140.741ms      8.024%   luau-release.exe                                  
chess                    108.704ms    120.477ms      7.217%   luau.exe             16.820%                     ---
life                     89.472ms    117.850ms     14.041%   luau-release.exe                                  
life                     84.031ms     90.128ms      4.073%   luau.exe             30.758%                     ---
matrixmult               22.973ms     26.456ms      7.131%   luau-release.exe                                  
matrixmult               23.127ms     25.711ms      6.120%   luau.exe              2.895%                     ---
mesh-normal-scalar       36.484ms     40.026ms      5.326%   luau-release.exe                                  
mesh-normal-scalar       36.697ms     41.451ms     10.985%   luau.exe             -3.440%                     ---
mesh-normal-vector       16.058ms     23.553ms     25.401%   luau-release.exe                                  
mesh-normal-vector       15.354ms     21.958ms     18.840%   luau.exe              7.262%                     ---
pcmmix                    9.692ms     12.948ms     17.032%   luau-release.exe                                  
pcmmix                    9.466ms     13.874ms     18.680%   luau.exe             -6.675%                     ---
qsort                     79.261ms     88.393ms      4.392%   luau-release.exe                                  
qsort                     76.836ms     84.015ms      5.305%   luau.exe              5.211%                     ---
sha256                   20.884ms     24.698ms     13.176%   luau-release.exe                                  
sha256                   20.330ms     23.174ms      6.187%   luau.exe              6.578%                     ---
ack                       61.167ms     67.941ms      6.893%   luau-release.exe                                  
ack                       59.050ms     63.793ms      4.398%   luau.exe              6.502%                     ---
binary-trees             31.414ms     42.079ms     12.149%   luau-release.exe                                  
binary-trees             33.291ms     41.206ms      9.915%   luau.exe              2.118%                     ---
fannkuchen-redux         14.332ms     18.834ms     14.776%   luau-release.exe                                  
fannkuchen-redux         13.994ms     27.551ms     47.901%   luau.exe            -31.637%                     ---
fixpoint-fact             66.334ms     85.004ms     12.688%   luau-release.exe                                  
fixpoint-fact             72.167ms     84.306ms     10.457%   luau.exe              0.828%                     ---
heapsort                 28.357ms     36.478ms     13.918%   luau-release.exe                                  
heapsort                 28.996ms     33.908ms     13.620%   luau.exe              7.578%                     ---
mandel                   62.071ms     67.995ms      5.661%   luau-release.exe                                  
mandel                   60.453ms     68.662ms      8.696%   luau.exe             -0.972%                     ---
n-body                   32.107ms     39.622ms     11.348%   luau-release.exe                                  
n-body                   29.082ms     33.097ms      5.875%   luau.exe             19.715%                     ---
qt                       77.750ms     92.622ms      7.432%   luau-release.exe                                  
qt                       74.206ms     93.012ms     17.902%   luau.exe             -0.419%                     ---
queen                      1.993ms      2.350ms     10.298%   luau-release.exe                                  
queen                      1.923ms      2.354ms     12.097%   luau.exe             -0.172%                     ---
scimark                   93.834ms    118.810ms     11.267%   luau-release.exe                                  
scimark                   85.269ms     98.401ms      8.170%   luau.exe             20.740%                     ---
spectral-norm             17.198ms     19.851ms     10.521%   luau-release.exe                                  
spectral-norm             16.261ms     17.909ms      6.064%   luau.exe             10.845%                     ---
sieve                    175.493ms    186.669ms      4.772%   luau-release.exe                                  
sieve                    167.688ms    176.877ms      3.409%   luau.exe              5.536%                     ---
3d-cube                    6.985ms      7.748ms      6.114%   luau-release.exe                                  
3d-cube                    6.811ms      7.552ms      7.879%   luau.exe              2.590%                     ---
3d-morph                  8.314ms      9.279ms      6.101%   luau-release.exe                                  
3d-morph                  7.587ms      8.410ms      6.936%   luau.exe             10.334%                     ---
3d-raytrace                8.939ms      9.871ms      6.667%   luau-release.exe                                  
3d-raytrace                8.923ms      9.856ms      6.075%   luau.exe              0.150%                     ---
controlflow-recursive      4.626ms      4.962ms      8.387%   luau-release.exe                                  
controlflow-recursive      4.240ms      4.522ms      5.667%   luau.exe              9.721%                     ---
crypto-aes               13.668ms     15.097ms      6.590%   luau-release.exe                                  
crypto-aes               13.144ms     13.853ms      2.960%   luau.exe              8.984%                     ---
fannkuch                 23.470ms     25.504ms      5.944%   luau-release.exe                                  
fannkuch                 22.516ms     23.366ms      2.966%   luau.exe              9.148%                     ---
math-cordic               13.414ms     14.635ms      4.089%   luau-release.exe                                  
math-cordic               13.886ms     14.728ms      4.084%   luau.exe             -0.635%                     ---
math-partial-sums          4.140ms      4.836ms     14.165%   luau-release.exe                                  
math-partial-sums          3.933ms      4.221ms      6.548%   luau.exe             14.569%                     ---
n-body-oop               43.327ms     46.371ms      4.145%   luau-release.exe                                  
n-body-oop               41.533ms     44.392ms      4.016%   luau.exe              4.459%                     ---
tictactoe                132.132ms    138.926ms      2.634%   luau-release.exe                                  
tictactoe                129.614ms    137.060ms      2.317%   luau.exe              1.361%                     ---
trig                     20.387ms     21.455ms      2.833%   luau-release.exe                                  
trig                     18.756ms     20.125ms      7.099%   luau.exe              6.606%                     ---
vector-math                8.702ms      9.506ms      6.781%   luau-release.exe                                  
vector-math                8.472ms      9.504ms      7.211%   luau.exe              0.024%                     ---
voxelgen                 70.312ms     76.452ms      5.600%   luau-release.exe                                  
voxelgen                 67.581ms     71.200ms      3.688%   luau.exe              7.376%                     ---
Total                   1441.444ms   1668.487ms         ---   luau-release.exe                                  
Total                   1388.405ms   1563.693ms         ---   luau.exe              6.702%                      

Mooshua added 4 commits March 26, 2025 17:56
This patch adds an __assume statement for passed assertations and switches the interpreter loop to using a while-switch instead of a goto-switch loop. This seems to prevent the performance regression experienced on MSVC2022.
@vegorov-rbx
Copy link
Collaborator

This wouldn't be the first time the computed goto auto-generation in VS2022 got broken.
Last time is what created that d2ssa-pre- option.
Thank you for bringing our attention to this regression, it sometimes happens and will surely happen again.

In these cases it's best to make some small update which restores VS2022 heuristics to trigger.

Experimenting a little bit, it really seems that no changes other than making LUAU_ASSERT(unsigned(pc - cl->l.p->code) < unsigned(cl->l.p->sizecode)); into an __assume are needed to make VS2022 generate computed goto again.
So there's no need to introduce a global __assume for all assertions or to make any changes to the structure of the interpreter loop.

@vegorov-rbx vegorov-rbx changed the title Significantly improve MSVC interpreter code generation Restore 'computed goto' code generation for the VM loop on VS2022 Mar 31, 2025
@vegorov-rbx
Copy link
Collaborator

Updated the title to reflect that this is only to restore the optimization that got lost with updated.
I will likely find some time to find a point in history when this has regressed.

@Mooshua
Copy link
Author

Mooshua commented Mar 31, 2025

I'm sorry to say I just now figured out I was comparing singlestep luau_executes instead of the full versions. Oops :(

Experimenting a little bit, it really seems that no changes other than making LUAU_ASSERT(unsigned(pc - cl->l.p->code) < unsigned(cl->l.p->sizecode)); into an __assume are needed to make VS2022 generate computed goto again.

None of the __assumes I tested had any noticeable impact on code generation, and that still remains true looking at the new loop. MSVC's assumes are a lot less 'strict' overall and comparing the behavior of eg __builtin_unreachable you would see a lot less control flow restructuring. My understanding was that it was the change from a simple block to a while loop that caused the improvements.

@vegorov-rbx
Copy link
Collaborator

vegorov-rbx commented Mar 31, 2025

Interesting!

Maybe my testing setup was a bit different, but I did see changes in code generation related to dispatch changes in the non-step version when I only applied your __assume change without trying out the loop change.
Will have to experiment a bit more. Improving single-step can still be useful, it is a bit unfortunate how easily MSVC can change its mind given that they don't support goto address language extension.

edit: well, even clang was recently caught breaking computed goto dispatch optimization in CPython llvm/llvm-project#106846 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants