-
Notifications
You must be signed in to change notification settings - Fork 940
lxd-agent: Added RSS metrics + Simplified calculation for better scalability #15094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d22d255
to
dd0e072
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only took a quick/cursory look at the code (not the tests) but it looks good to me. Thanks for including tests BTW, I'm looking forward to reviewing them!
Would you mind slicing you changes into multiple commits (one per file changed should do) like you had in the earlier PR you closed. BTW, you can force push to an existing PR to bring it into shape, no need to create fresh ones.
Please can you sign your commits and the CLA. |
I went to the website and signed it. Thank you. |
95d1ec9
to
308b8ae
Compare
Sorry for all the Git confusion. I've been wrestling with my Git setup lately and clearly still need to work out some kinks. Thank you for your patience with the commit signing and DCO issues. |
50e0285
to
36dbcf0
Compare
Please can you sign your commits. |
ec33b0f
to
b25ee40
Compare
I am trying to sign off my commits using git push --force-with-lease origin lxd-agent-metrics-add-rss-clean but it still does not seem to pass the DCO. Should I squash my commits and try again? |
The DCO check if failing due to a mismatch in your declared name ("Martin" vs "MrMartyK"):
|
b25ee40
to
4b98e96
Compare
That fixed it, thank you so much. I appreciate it. Should I squash all this? |
Looks like this was forgotten about or undone in some rebase. I know, git isn't intuitive :/ |
defa921
to
3ef7782
Compare
Yea I apologize for that. At my job its a whole proprietary way of doing things + I am running on a new system here so git is being annoying. I am more of a GUI git user (a sin I know) I am trying to use git through terminal for the first time in ages and it has been a pain. I am getting setup wth gitkraken right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit slicing is still not 100% there. Your last commit also has duplicated Signed-off-by
.
If that makes it easier for you, feel free to drop the Co-authored-by
tag. It's nice of you but that's the script is your own work ;)
I don't mind putting the Co-authored-by tag. If it wasn't for you catching the grep and awk filtering I wouldn't have seen it. I learned a new thing today which is great. |
3ef7782
to
62cd157
Compare
I don't know how it works with GUI but on the CLI, you can You will be left in your repo after the commit to |
62cd157
to
de7d956
Compare
Yea I just went through cherry picking and rebasing. This should be better now. Three separate commits for the initial implementation: Then Four refinement commits in sequence: |
Indeed, much better slicing now.
By convention, we require fixup commits like that be "folded" into their respective initial commit. So this PR should just have 3 commits, one per file. |
I could do that as well. I could put fixup/improvement commits into their initial commits. |
8f55098
to
d816542
Compare
@MrMartyK could you please rebase ( The |
d816542
to
059ffac
Compare
Sorry for the late reply. Busy work day, a lot of code reviews. I've rebased the branch onto the latest main and force-pushed the changes. This should fix the golangci-lint errors from the CI checks. |
I ran all the necessary formatting checks to ensure everything is clean and consistent. No issues detected:
Everything looks good, pushing it now! |
059ffac
to
c3428e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Co-authored-by
flag if you keep it should be put right before your Signed-off-by
line otherwise GitHub doesn't recognize it as such.
As stated previously, I'm fine with your dropping it ;)
No worries and thanks for your persistence, that's really appreciated! |
c3428e9
to
a126598
Compare
I just fixed that, thank you for pointing that out. I made sure you show up. |
lxd-agent/metrics.go
Outdated
@@ -358,10 +367,16 @@ func getMemoryMetrics() (metrics.MemoryMetrics, error) { | |||
case "MemAvailable": | |||
out.MemAvailableBytes = value | |||
case "MemFree": | |||
memFreeBytes = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need memFreeBytes
variable if we already have out.MemFreeBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They let me track which required fields are present before attempting the RSS calculation. These variables also keep the RSS calculation components separate from general metrics collection, making the code more maintainable. Using dedicated variables makes the RSS calculation formula directly match the standard memory equation, improving readability so when I or someone else revisit my code later, this structured approach helps me quickly understand things since everything has a clear purpose. This method ensures reliable calculations even when some /proc/meminfo fields might be missing in certain environments. I hope this makes sense.
|
||
# Test passes if difference is less than 5% | ||
# Use awk to compare float values since bash can only handle integers | ||
if (( $(awk "BEGIN {print ($ABS_DIFF_PCT < 5.0) ? 1 : 0}") )); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I'm pretty sure that I misunderstand something but...
So, you expect that RSS is approximately equal to the value from a used
column in free
output, right?
Now, let's look into the source code of free
:
https://gitlab.com/procps-ng/procps/-/blob/master/library/meminfo.c#L736
mem_used = mHr(MemTotal) - mHr(MemAvailable);
Both MemTotal
and MemAvailable
are present in /proc/meminfo
file. Why we can't calculate RSS the same way as MemTotal - MemAvailable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I deliberately chose MemTotal - (MemFree + Buffers + Cached - Shmem) for specific technical reasons. MemAvailable only exists in Linux 3.14 (2014) and newer, while my formula works across all kernel versions that LXD supports. The explicit formula clearly shows which memory components are being considered, making the code self-explanatory. This approach matches traditional memory accounting methods used before MemAvailable existed. My test script confirms both approaches give nearly identical results, proving it is accurate just not as precise or new as MemAvailable. It also helps a lot when dealing debugging memory issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum kernel version we support is 5.15 and the oldest Ubuntu kernel still supported (by Canonical, not LXD) is 3.13 which is just too ancient to ever see a modern LXD.
While I appreciate the effort for supporting older kernels, I think we need to be concerned about RHEL/CentOS 7 ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for overcomplicating things. You were absolutely right about using MemTotal - MemAvailable
- it's much simpler and accurate. I actually already implemented this simpler approach in my latest commit (both in the code and validation script). I was trying to be too clever instead of researching standard practices more thoroughly. I appreciate your clarity on this!
Hi @MrMartyK , thanks for working on this! |
Hi @mihalicyn Thank you for reviewing. Sorry for the late replies, I was on vacation the past week. Those are all great questions! |
|
||
# Test passes if difference is less than 5% | ||
# Use awk to compare float values since bash can only handle integers | ||
if (( $(awk "BEGIN {print ($ABS_DIFF_PCT < 5.0) ? 1 : 0}") )); then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum kernel version we support is 5.15 and the oldest Ubuntu kernel still supported (by Canonical, not LXD) is 3.13 which is just too ancient to ever see a modern LXD.
While I appreciate the effort for supporting older kernels, I think we need to be concerned about RHEL/CentOS 7 ;)
lxd-agent/metrics.go
Outdated
if foundMemTotal && foundMemFree && foundBuffers && foundCached && foundShmem { | ||
// Formula: RSS = MemTotal - (MemFree + Buffers + Cached - Shmem) | ||
// This matches how tools like 'free' calculate used memory | ||
rssBytes := memTotalBytes - (memFreeBytes + buffersBytes + cachedBytes - shmemBytes) | ||
out.RSSBytes = rssBytes | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ignore the backward compat for kernels <3.14, the simpler version proposed by Aleks would be a nice simplification.
Might also be good to do away with the variable duplication (memFreeBytes
vs out.MemFreeBytes
) as pointed out by Aleks. I don't know how smart is the Go compiler to detect the duplication and optimize it away so it's best to be on the safe side as that shouldn't really harm the readability at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback on variable duplication and the simpler formula. I've already made these changes in my latest commit removing the duplicate variables and implementing the MemTotal - MemAvailable
formula. I apologize for not taking this approach initially, I was overthinking the problem instead of following established patterns. I should have done more research on my end. Your feedback helped me understand the right way to approach this. I appreciate it.
a126598
to
3cc68bf
Compare
This implements proper RSS memory metric calculation in lxd-agent using a simple and modern approach. The solution provides O(1) complexity with accuracy matching standard system tools. - Added efficient RSS calculation using the formula: RSS = MemTotal - MemAvailable - Added validation for required fields in /proc/meminfo Signed-off-by: MrMartyK <martykareem@outlook.com>
Add comprehensive unit tests for the simplified RSS memory calculation, including: - Test for correct formula implementation using MemTotal - MemAvailable - Test for handling missing MemAvailable field in /proc/meminfo - Validation of calculation against expected values Signed-off-by: MrMartyK <martykareem@outlook.com>
Add a validation script for RSS calculation that: - Compares RSS calculation with free command output - Uses the simplified formula: MemTotal - MemAvailable - Provides detailed calculation breakdown for debugging - Validates results are within 5% of standard tools Co-authored-by: Simon Deziel <simon.deziel@canonical.com> Signed-off-by: MrMartyK <martykareem@outlook.com>
3cc68bf
to
0ea03d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much cleaner now, thanks! I left a tiny comment.
// Calculate RSS using the simpler and more modern approach | ||
if foundMemTotal && foundMemAvailable { | ||
// Formula: RSS = MemTotal - MemAvailable | ||
// This is how modern tools like 'free' calculate used memory | ||
out.RSSBytes = out.MemTotalBytes - out.MemAvailableBytes | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leverage the default value of 0 and use it to detect if RSSByte
needs to be set to non-0:
// Calculate RSS using the simpler and more modern approach | |
if foundMemTotal && foundMemAvailable { | |
// Formula: RSS = MemTotal - MemAvailable | |
// This is how modern tools like 'free' calculate used memory | |
out.RSSBytes = out.MemTotalBytes - out.MemAvailableBytes | |
} | |
if out.MemTotalBytes > out.MemAvailableBytes { | |
// Modern tools like 'free' define RSS as being equal to: MemTotal - MemAvailable | |
out.RSSBytes = out.MemTotalBytes - out.MemAvailableBytes | |
} |
This way, we wouldn't need the 2 booleans.
Previous PR Notice: I initially submitted this fix in #15093 but discovered after submission that my Git commit signatures weren't properly configured, causing CLA validation failures. This new PR contains:
Apologies for the earlier misconfiguration - this version passes all contribution checks.
Summary
This PR resolves the "FIXME: Missing RSS" comment in
metrics.go
by implementing proper RSS memory metric calculation in lxd-agent using kernel memory accounting. The solution provides O(1) complexity with accuracy matching standard system tools.Context
The original FIXME existed because:
Key realization:
/proc/meminfo
contains all required fields for system-wide RSS calculation without needing process enumeration.Changes
/proc/meminfo
fieldsPerformance
Testing
free -m
within 1%Benefits
Impact
This is my first contribution to the LXD project, implementing enhanced RSS metrics. As this is my initial PR, I am very open to feedback and committed to making any necessary adjustments to align with project guidelines.
Please do not hesitate to let me know if you require any further information before considering this for merge. To keep this initial PR focused, I have not included all the detailed testing scripts and container configurations I used, but I have these readily available and can provide them if they would be helpful for review. I have provided the metrics_test.go and the test_rss_calculation.sh, let me know if you want more. Thank you again.