Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire O_DIRECT also to Uncached I/O #17218

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

amotin
Copy link
Member

@amotin amotin commented Apr 4, 2025

Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS.

While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications.

To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms.

Fixes #17027

How Has This Been Tested?

Basic read/write tests with and without O_DIRECT, observing proper cache behavior.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin amotin requested a review from bwatkinson April 4, 2025 19:41
@amotin
Copy link
Member Author

amotin commented Apr 4, 2025

The change is quite invasive on DMU/DBUF APIs. In some places it was not obvious what is better or where to stop, so I am open to comments on whether we'd like to change more, while already there, or less, to keep some parts of compatibility if that is even possible at all with so big change.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Apr 4, 2025
@tonyhutter
Copy link
Contributor

Would it makes sense to activate this when the user does a fnctl(fd, F_SET_RW_HINT, RWH_WRITE_LIFE_SHORT)? I believe we could key off the the file's inode->i_write_hint value on the kernel side.

RWF_UNCACHED also sounds similar to this concept, but I don't think it's merged into the kernel yet.

@amotin
Copy link
Member Author

amotin commented Apr 4, 2025

@tonyhutter I am not familiar with that fcntl, but sounds possible as a next step. I was thinking about posix_fadvise() or something like it.

@tonyhutter
Copy link
Contributor

Ah yes, posix_fadvise() makes sense too.

@amotin amotin force-pushed the direct branch 3 times, most recently from 742a1cb to 7579173 Compare April 7, 2025 01:20
Before Direct I/O was implemented, I've implemented lighter version
I called Uncached I/O.  It uses normal DMU/ARC data path with some
optimizations, but evicts data from caches as soon as possible and
reasonable.  Originally I wired it only to a primarycache property,
but now completing the integration all the way up to the VFS.

While Direct I/O has the lowest possible memory bandwidth usage,
it also has a significant number of limitations.  It require I/Os
to be page aligned, does not allow speculative prefetch, etc.  The
Uncached I/O does not have those limitations, but instead require
additional memory copy, though still one less than regular cached
I/O.  As such it should fill the gap in between.  Considering this
I've disabled annoying EINVAL errors on misaligned requests, adding
a tunable for those who wants to test their applications.

To pass the information between the layers I had to change a number
of APIs.  But as side effect upper layers can now control not only
the caching, but also speculative prefetch.  I haven't wired it to
VFS yet, since it require looking on some OS specifics.  But while
there I've implemented speculative prefetch of indirect blocks for
Direct I/O, controllable via all the same mechanisms.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Fixes openzfs#17027
@amotin
Copy link
Member Author

amotin commented Apr 7, 2025

F_SET_RW_HINT and inode->i_write_hint seem about on-disk life time, not saying anything about caching. May be they could be used for some allocation policies to reduce pool fragmentation, but not for this PR. RWF_UNCACHED indeed looks very similar.

*/
ioflag &= ~O_DIRECT;
if (os->os_direct == ZFS_DIRECT_ALWAYS) {
/* Force either direct of uncached I/O. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"direct or uncached"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

systemd-sysext fails to install extensions with Direct I/O
2 participants