-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zpool status --lockless|--trylock
#17193
base: master
Are you sure you want to change the base?
Conversation
Add the following flags to `zpool status`: --lockless: Try for a short amount of time to get the spa_namespace lock. If that doesn't work, then do the zpool status locklessly. This is dangerous and can crash your system if the pools configs are being modified while zpool status is running. This flag requires the zfs_allow_lockless_zpool_status module param be set to 1 in in order to run. --trylock: Try for a short amount of time to get the spa_namespace lock. If that doesn't work then simply abort 'zpool status'. These commands allow users to view the zpool status when the pool gets stuck while holding the spa_namespace lock. Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Do we have many command which can crash system? This is clearly developer-centric command line switch guarded by module parameters, but require documentation for users which can raise eyebrows. Sound like it also asks for future CVEs for elevating privileges to rootkits. I generally like this capability, but it also strange and scary. |
To be specific, the only way it will crash the system is if the pool configuration is changing while running the command. And 99% of the time you're going to be running From a UI perspective, I would be open to removing the |
I have a question, I don't pretend I understand it fully - there seem to be multiple locks involved, Also one thing crossed my mind - if you're really going to use this only as a dev feature then it doesn't matter, but - could a drive failure also constitute a spa config change, which could lead to bad dereference? |
The problem is that we don't know exactly which locks are held when a catastrophic event crashes the pool. At the very least, it's going to be the spa_namespace lock. It may hold other per-spa locks as well.
I imagine It's possible a drive failure could cause a bad de-reference if |
I've convinced myself that I like the |
Zpool status can be run as an ordinary user. Sounds like this flag should
be restricted to root only because of extra risk. I wonder if it even be a
separate configure flag as people who build/use hardened kernels may not
want a potentially compromised kernel memory integrity.
…On Sat, Apr 5, 2025, 05:15 Tony Hutter ***@***.***> wrote:
From a UI perspective, I would be open to removing the
--lockless|--trylock flags from zpool status and using a envar instead,
like ZPOOL_LOCK_BEHAVIOR=lockless|trylock. That would make it less
prominent to users. An envar could also be desirable as there may be people
who want --lockless or --trylock as the default behavior for zpool status.
I've convinced myself that I like the ZPOOL_LOCK_BEHAVIOR=lockless|trylock
envar idea better than the --lockless|--trylock flags. If I don't hear
any opinions to the contrary in the next couple of days, I'm going to
implement it and push the changes to this PR.
—
Reply to this email directly, view it on GitHub
<#17193 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXQ6HMEVMANE5Y5IKVGXW32X3D2ZAVCNFSM6AAAAABZ7747J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZZGQ2TENRQG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: tonyhutter]*tonyhutter* left a comment (openzfs/zfs#17193)
<#17193 (comment)>
From a UI perspective, I would be open to removing the
--lockless|--trylock flags from zpool status and using a envar instead,
like ZPOOL_LOCK_BEHAVIOR=lockless|trylock. That would make it less
prominent to users. An envar could also be desirable as there may be people
who want --lockless or --trylock as the default behavior for zpool status.
I've convinced myself that I like the ZPOOL_LOCK_BEHAVIOR=lockless|trylock
envar idea better than the --lockless|--trylock flags. If I don't hear
any opinions to the contrary in the next couple of days, I'm going to
implement it and push the changes to this PR.
—
Reply to this email directly, view it on GitHub
<#17193 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXQ6HMEVMANE5Y5IKVGXW32X3D2ZAVCNFSM6AAAAABZ7747J2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONZZGQ2TENRQG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
+1 for the env var, if it counts :D |
@tonyhutter does it though? Seems like once the param is set, any |
@robn thanks, I like your |
Motivation and Context
Allow
zpool status
to work even when the pool is locked up.Description
Add the following flags to
zpool status
:--lockless
: Try for a short amount of time to get the spa_namespace lock. If that doesn't work, then do the zpool status locklessly. This is dangerous and can crash your system if the pools configs are being modified while zpool status is running! This flag requires thezfs_allow_lockless_zpool_status
module param be set to1
in in order to run.--lockless
should only be used for emergency situations.--trylock
: Try for a short amount of time to get the spa_namespace lock. If that doesn't work then simply abortzpool status
.These commands allow users to view the
zpool status
output even when the pool gets stuck while holding the spa_namespace lock.The amount of time
--trylock
and--lockless
try to get the spa_namespace lock before moving on is controlled by thespa_namespace_trylock_ms
module param. The default is to try for 100ms.How Has This Been Tested?
Manually crashed the pool by calling
BUG()
while holding the spa_namespace lock . Saw the traditionalzpool status
command hanging as before. Sawzpool status --trylock
successfully abort, andzpool status --lockless
successfully return status output. I also added a test case.Types of changes
Checklist:
Signed-off-by
.