Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zpool status --lockless|--trylock #17193

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Allow zpool status to work even when the pool is locked up.

Description

Add the following flags to zpool status:

--lockless: Try for a short amount of time to get the spa_namespace lock. If that doesn't work, then do the zpool status locklessly. This is dangerous and can crash your system if the pools configs are being modified while zpool status is running! This flag requires the zfs_allow_lockless_zpool_status module param be set to 1 in in order to run. --lockless should only be used for emergency situations.

--trylock: Try for a short amount of time to get the spa_namespace lock. If that doesn't work then simply abort zpool status.

These commands allow users to view the zpool status output even when the pool gets stuck while holding the spa_namespace lock.

The amount of time --trylock and --lockless try to get the spa_namespace lock before moving on is controlled by the spa_namespace_trylock_ms module param. The default is to try for 100ms.

How Has This Been Tested?

Manually crashed the pool by calling BUG() while holding the spa_namespace lock . Saw the traditional zpool status command hanging as before. Saw zpool status --trylock successfully abort, and zpool status --lockless successfully return status output. I also added a test case.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Add the following flags to `zpool status`:

--lockless: Try for a short amount of time to get the spa_namespace
lock.  If that doesn't work, then do the zpool status locklessly.  This
is dangerous and can crash your system if the pools configs are being
modified while zpool status is running. This flag requires the
zfs_allow_lockless_zpool_status module param be set to 1 in in order to
run.

--trylock: Try for a short amount of time to get the spa_namespace
lock.  If that doesn't work then simply abort 'zpool status'.

These commands allow users to view the zpool status when the pool
gets stuck while holding the spa_namespace lock.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
@IvanVolosyuk
Copy link
Contributor

Do we have many command which can crash system? This is clearly developer-centric command line switch guarded by module parameters, but require documentation for users which can raise eyebrows. Sound like it also asks for future CVEs for elevating privileges to rootkits. I generally like this capability, but it also strange and scary.

@tonyhutter
Copy link
Contributor Author

Do we have many command which can crash system? This is clearly developer-centric command line switch guarded by module parameters, but require documentation for users which can raise eyebrows.

To be specific, the only way it will crash the system is if the pool configuration is changing while running the command. And 99% of the time you're going to be running --lockless when the pool is locked up, which also means that the pool configuration is not likely to be changing. Also keep in mind that --lockless first tries to get the lock for 100ms, and if it isn't successful, only then will it go lockless. So it sounds scary on the surface, but in practice I think it would actually be hard to crash the system with --lockless.

From a UI perspective, I would be open to removing the --lockless|--trylock flags from zpool status and using a envar instead, like ZPOOL_LOCK_BEHAVIOR=lockless|trylock. That would make it less prominent to users. An envar could also be desirable as there may be people who want --lockless or --trylock as the default behavior for zpool status. I'm curious if anyone has thoughts/preference on this?

@snajpa
Copy link
Contributor

snajpa commented Apr 2, 2025

To be specific, the only way it will crash the system is if the pool configuration is changing while running the command

I have a question, I don't pretend I understand it fully - there seem to be multiple locks involved, spa_namespace_lock and 8 spa_config_lock mtxes; are those 8 mutexes as hot as spa_namespace_lock itself? Wouldn't it be possible to take just a subset of those 8 to protect against the kind of changes which could lead to bad derefs?

Also one thing crossed my mind - if you're really going to use this only as a dev feature then it doesn't matter, but - could a drive failure also constitute a spa config change, which could lead to bad dereference?

@tonyhutter
Copy link
Contributor Author

Wouldn't it be possible to take just a subset of those 8 to protect against the kind of changes which could lead to bad derefs?

The problem is that we don't know exactly which locks are held when a catastrophic event crashes the pool. At the very least, it's going to be the spa_namespace lock. It may hold other per-spa locks as well.

Also one thing crossed my mind - if you're really going to use this only as a dev feature then it doesn't matter, but - could a drive failure also constitute a spa config change, which could lead to bad dereference?

I imagine --lockless will be used by both admins and devs.

It's possible a drive failure could cause a bad de-reference if zpool status --lockless was running at the same time. It's very unlikely, as I wouldn't think a faulted drive is going to change the spa_namespace AVL tree. However it could be that the faulted drive codepath is writing some spa_t value, like a string, while zpool status --lockless is reading from the same spa_t. The stars would have to be aligned for this to happen though. An create/export/import/destroy would be the more likely cause of a bad de-reference, as those modify the spa_namespace AVL tree.

@tonyhutter
Copy link
Contributor Author

From a UI perspective, I would be open to removing the --lockless|--trylock flags from zpool status and using a envar instead, like ZPOOL_LOCK_BEHAVIOR=lockless|trylock. That would make it less prominent to users. An envar could also be desirable as there may be people who want --lockless or --trylock as the default behavior for zpool status.

I've convinced myself that I like the ZPOOL_LOCK_BEHAVIOR=lockless|trylock envar idea better than the --lockless|--trylock flags. If I don't hear any opinions to the contrary in the next couple of days, I'm going to implement it and push the changes to this PR.

@IvanVolosyuk
Copy link
Contributor

IvanVolosyuk commented Apr 4, 2025 via email

@tonyhutter
Copy link
Contributor Author

Sounds like this flag should be restricted to root only because of extra risk

--lockless is effectively root only, since it requires the zfs_allow_lockless_zpool_status module param to be set to 1 in order to work.

@snajpa
Copy link
Contributor

snajpa commented Apr 5, 2025

+1 for the env var, if it counts :D

@robn
Copy link
Member

robn commented Apr 6, 2025

--lockless is effectively root only, since it requires the zfs_allow_lockless_zpool_status module param to be set to 1 in order to work.

@tonyhutter does it though? Seems like once the param is set, any zpool caller is allowed to invoke it? I wonder if zfs_get_zpool_lock_behavior_helper() should also call secpolicy_sys_config() to make sure the caller has enough permission to blow things up? Maybe even that's not enough, maybe it's closer to secpolicy_zinject() (which can also crash the pool if you hold it right)?

@tonyhutter
Copy link
Contributor Author

@robn thanks, I like your secpolicy_sys_config() idea. So much cleaner than zfs_allow_lockless_zpool_status. I'll add that in my next push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants