You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem:
Certain conditions, such as when large numbers of SeAT notification jobs fire in a short period of time utilizing Discord webhooks, can trigger Discord's per-route rate limiting, causing POSTs to fail with a 429 Too Many Requests which can lead to a MaxAttemptsExceededException SeAT exception.
This causes scenarios where large numbers of notification jobs may end up in a failed state and notifications will never be sent which could be risky depending on the severity of the notification (e.g. structure reinforcement, members leaving corps, etc). Current rate limits in SeAT appear to be set to 45/minute to avoid the documented global rate limit of 50/minute, but Discord also has an additional per-route rate limit value that is not specifically documented but is mentioned. This likely means that if a certain number of notifications that produce a Discord message configured to use the same webhook fire at or near the same time, jobs can fail. When this occurs enough times to cause a job failure, no message is sent to the Discord channel, the job retries a few times until it fails, and effectively the notification is lost.
Expected: Rate limits are handled dynamically via header as per the Discord documentation to avoid rate limit conditions or adjusted further to avoid potential conditions.
In the interim, adjusting the rate limiting settings to ensure a low-ish per second maximum, some online forums say anywhere from 3-5 per second max, in addition to the 45/minute may be enough to mitigate the issue.
Any limits hardcoded in SeAT would not apply to any other software running on the same system, due to global rate limiting occurring on a per-IP basis. While this is not the case in my instance as no other software on this machine is using Discord webhooks, it may provide clarity whether to determine if rate limit avoidance via header is best rather than adjusting the limits as-is.
Note: This webhook is only being used for SeAT - it is not re-used for any other process.
First image shows API return hitting the rate limit and indicates that it is not the global limit - hence why I believe this is related to a per-route limit:
This image shows behavior once the job fails enough times, rendering the job permanently failed (no notification):
This image shows a number of failures around the same timeframe, two are pictured above. (Note: Items with "3" attempts are 429 Too Many Requests failures, items with "4" attempts are MaxAttemptsExceededException.
Version Info: SeAT v5, all modules up-to-date as of 2024-07-31.
The text was updated successfully, but these errors were encountered:
One option to solve this would be storing rate limit info in the cache scoped to the specific integration and using that to release jobs back onto the queue if they would go over the rate limit. I'm thinking something along the lines of:
After each webhook response from Discord, store rate limit info from the response headers in the cache
In the Discord notification job, check if sending the notification would violate the rate limit before actually sending it. If so, release the job back to the queue, passing the unix timestamp for the time when the rate limit resets to the release function
We do make an attempt to rate limit the requests to the discord api as seen here I think this is based on the wrong rate limit as we do not send a bot token with the request.
Reading online and testing with the discord api, the headers returned by the api for the rate limit will not be accurate for us as we can still get 429 while having a rate limit remaining as per the headers.
There appears to be multiple rate limits in play here that would be competing:
50 rps / IP (what appears in the headers)
y rps / webhook?
30 rpm / channel (shared across all webhooks in that channel)
So we should probably have a better overall rate limit, as well as monitor for 429s and then shutoff requests for a backoff period.
Problem:
Certain conditions, such as when large numbers of SeAT notification jobs fire in a short period of time utilizing Discord webhooks, can trigger Discord's per-route rate limiting, causing POSTs to fail with a
429 Too Many Requests
which can lead to aMaxAttemptsExceededException
SeAT exception.This causes scenarios where large numbers of notification jobs may end up in a failed state and notifications will never be sent which could be risky depending on the severity of the notification (e.g. structure reinforcement, members leaving corps, etc). Current rate limits in SeAT appear to be set to 45/minute to avoid the documented global rate limit of 50/minute, but Discord also has an additional per-route rate limit value that is not specifically documented but is mentioned. This likely means that if a certain number of notifications that produce a Discord message configured to use the same webhook fire at or near the same time, jobs can fail. When this occurs enough times to cause a job failure, no message is sent to the Discord channel, the job retries a few times until it fails, and effectively the notification is lost.
Discord Rate Limiting Documentation:
https://discord.com/developers/docs/topics/rate-limits
Expected: Rate limits are handled dynamically via header as per the Discord documentation to avoid rate limit conditions or adjusted further to avoid potential conditions.
Code sections that appear to be configured to handle Discord rate limiting:
https://github.com/eveseat/notifications/blob/5.0.x/src/Notifications/AbstractDiscordNotification.php#L32
https://github.com/eveseat/notifications/blob/5.0.x/src/NotificationsServiceProvider.php#L226
Logs / Screenshots / Proof:
Note: This webhook is only being used for SeAT - it is not re-used for any other process.
First image shows API return hitting the rate limit and indicates that it is not the global limit - hence why I believe this is related to a per-route limit:

This image shows behavior once the job fails enough times, rendering the job permanently failed (no notification):

This image shows a number of failures around the same timeframe, two are pictured above. (Note: Items with "3" attempts are

429 Too Many Requests
failures, items with "4" attempts areMaxAttemptsExceededException
.Version Info: SeAT v5, all modules up-to-date as of 2024-07-31.
The text was updated successfully, but these errors were encountered: