-
Notifications
You must be signed in to change notification settings - Fork 3
Performance Improvements #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Performance numbers with an inode cache:
|
Performance numbers with:
|
Performance numbers with:
With block cache
|
Performance numbers with:
|
Summary so far
Conclusions:
|
@asomers What do you think about this? Do you have any suggestions on what could improve the performance? |
I'll share what I found when optimizing the performance of https://github.com/KhaledEmaraDev/xfuse . The biggest problem I found was that even when the kernel cache was enabled, read amplification was still high. That was because the kernel only cached the contents of files, not metadata structures. For example, files' indirect blocks were not cached. So when reading a very large file, the daemon would be forced to reread some of those indirect blocks over and over. Fixing that problem required the daemon to cache that metadata itself. The most logical way I found to do it was to attach said metadata in memory to the inode and cache it until FUSE_FORGET dismissed the inode. That does have the potential disadvantage of high memory consumption if the kernel rarely or never forgets a vnode. But in practice I did not find it to be a problem. Footnotes |
Thanks for your insight. My approach was to implement a block cache for As for the indirect block cache, I could probably integrate it into the inode cache somwhow.. Right now I'm using time-based benchmark for simplicity and flamegraphs for checking what's taking up most of the time, By using flamegraphs I was able to optimize The problem of the kernel never forgetting can be solved via |
Beware: a flamegraph will only tell you what's taking the most CPU cycles, not the most time. I/O-bound operations won't show up in a flamegraph. |
That's interesting, because right now |
TODO:
inode_resolve_block()
Original performance numbers (on a Ryzen 9 7900):