Skip to content

svm: collect transaction balances #5588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

2501babe
Copy link
Member

@2501babe 2501babe commented Mar 31, 2025

Problem

#4253 aka "simd83 locking" intends to allow multiple transactions in the same batch to write to the same account. presently, banking stage and blockstore processor collect pre- and post-execution balances by loading all accounts before and after commit. this does not work in a simd83 world because the intermediate balances of an account modified multiple times would be wrong (although the first pre- and final post-balance would be correct)

we have attempted to solve this in several different manners, as detailed in the notes below (not necessary to understand this pr). they dont work. what if there was a better way?

Summary of Changes

there is. well, a less bad way:

  • add a new option enable_transaction_balance_recording to ExecutionRecordingConfig
  • when this option is enabled, before and after execution in svm, we gather native lamport balances for all accounts, and token balances for all valid, initialized token accounts, along with mint address and decimals. we do this in a new struct BalanceCollector which isolates the logic from load_and_execute_sanitized_transactions(). we wrap this in Option<_> throughout, so when the recording config is false no extra work is done. we parse the token accounts and mints using the new crate spl-generic-token which provides the minimum necessary information common to both token programs (and can be extended if others are ever created, even if their buffer layouts are different)
  • Option<BalanceCollector> is added to LoadAndExecuteSanitizedTransactionsOutput and passed up to the ultimate caller. this is important because one of the main challenges has been that banking stage and blockstore processor must gather what is essentially intermediate state internal to svm, passing this through Bank itself without relying on Bank to store it or care about it
  • all balance collection code is removed from banking stage and blockstore processor. it simply receives the BalanceCollector. it no longer matters whether it handles it before or after commit because no more accounts need to be loaded
  • a simple function compile_collected_balances() turns the four vecs in BalanceCollector into the two structs containing the balances, transforming u64 token amounts into UiTokenAmount as required by the transaction status sender
  • ledger/src/token_balances.rs is deleted and solana-ledger no longer depends on the token programs directly (although it does indirectly via account-decoder; in the future we could move the UiTokenAmount builder somewhere and break this link)
  • balance collection timings are removed from ConsumeWorkerTimingMetrics and LeaderExecuteAndCommitTimings and added to ExecuteTimings. we expect balance collection to be much faster in this pr because we will already have all token accounts pre-loaded from accounts-db, and mints have to be loaded at most once per batch. the version in master has to load all accounts for all transactions

BalanceCollector has no special handling for pre- vs post-balances, ie it does not use TransactionLoadResult or TransactionProcessingResult for anything. in practice doing so makes the code more complex and brittle for uncertain speed gains, probably minimal since the accounts are already loaded

note that there are bank tests that expect to see full balances even for failed or discarded transactions, so that behavior has been left unchanged

this pr unblocks #4253

Notes (feel free to ignore)

this is too jargony to probably make sense to most and more for my reference in case we have to rehash past attempts in code review of this pr:

  • bals1, the svm three card monte: add a balance recording flag to the recording config. rather messy balance collection in txp, directly with no helpers. the idea was to use a callback from ledger to parse the token accounts so svm doesnt see spl-token but otherwise do the work ourselves in svm. tack the bals onto the processing result to get picked up upstream. this was bad because it very poorly separated concerns and did too much in txp itself, and we thought we had better approaches we could try

  • bals2, the mysterious dinner guest: i dont remember what this one was

  • bals3, the gigafunction: something in ledger which attempted to construct balances post-hoc from the transaction results. this was a clever machine which attempted to allow svm to have no hand in any of this. unfortunately the approach is inherently imperfect and has tons of gimmickry to emulate balance changes as one steps through the transactions. we crashed into the rocks because to get decimals right we need to either:

    • ignore mints that didnt exist before, but this produces bad balances for pathological mint usage. it also doesnt necessarily cover 100% of usecases, eg if you want to see balances of mints you created
    • load mint before and after commit and skip balances for any mints that changed. this avoids most mint hijinks but is still wrong for extreme hijinks. it lets us show more balances but forces a weird two-stage flow where we have to load all accounts up front
    • fully track mint state changes as we advance through the batch. this is 100% correct and gets ideal coverage. but it is a nightmare to code and maintain, the most rube goldberg situation of any option

    furthermore all of these have the problem of, for blockstore, we need to put the logic in the precommit callback. and that code has some unusual and possibly dangerous-to-modify interactions with bank locking to support the unified scheduler

  • bals4, the trait object hand grenade: we package up a machine that does the work and lob it into svm. this was an inspired option but it is fiendish. we have a collector trait in svm and impl it in ledger, but the collector trait depends on account loading, and passing in the collector trait object is very difficult because the balance collector is generic over concrete impls of either a new loader trait or txpcb, both of which are created below blockstore, so they cannot be unified by rustc. it also, even if it did work, would make function signatures truly ghastly

  • bals5, the present concern: this pr. we take the general approach of bals1, with a new recording option and balance recording in svm, passing the balances up in the processing results. but we package all the logic into a struct so txp is unperturbed, use a very light token parser directly in svm instead of passing any function pointers or trait objects down from ledger, and do the final transform of balance+decimals -> uiamount in ledger because this depends on account-decoder. we originally did the token work with agave inline-spl, but because svm is being broken out of agave, and because we want new features for tokens anyway, we made a new spl library that does exactly what we want

@2501babe 2501babe force-pushed the 20250331_bals_take5_ultimate_redux branch 3 times, most recently from cd80dce to 52ba648 Compare April 1, 2025 16:18
@codecov-commenter
Copy link

codecov-commenter commented Apr 1, 2025

Codecov Report

Attention: Patch coverage is 75.22523% with 55 lines in your changes missing coverage. Please review.

Project coverage is 83.0%. Comparing base (e7d3003) to head (fb7c42b).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #5588     +/-   ##
=========================================
- Coverage    83.0%    83.0%   -0.1%     
=========================================
  Files         828      829      +1     
  Lines      375623   375370    -253     
=========================================
- Hits       311918   311621    -297     
- Misses      63705    63749     +44     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@2501babe 2501babe force-pushed the 20250331_bals_take5_ultimate_redux branch from 187f4ef to 5459890 Compare April 14, 2025 16:33
@2501babe 2501babe self-assigned this Apr 15, 2025
@2501babe 2501babe force-pushed the 20250331_bals_take5_ultimate_redux branch from c93d734 to 79ba85f Compare April 15, 2025 15:42
@2501babe 2501babe requested a review from apfitzge April 15, 2025 16:58
Comment on lines 45 to 46
solana-rent-debits = { workspace = true }
solana-sdk = { workspace = true }
solana-sdk-ids = { workspace = true }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a pleasant surprise

Comment on lines 367 to 371
.map(|_| Err(TransactionError::ProgramCacheHitMaxLimit))
.collect(),
balance_collector: None,
};
}
Copy link
Member Author

@2501babe 2501babe Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting None here is absolutely the right call because this branch discards all transactions and all pending state changes. what is less clear is whether it matters that compile_collected_balances() in ledger/src/transaction_balances.rs returns (vec![], vec![], vec![], vec![])

now that im going through this code with a reviewers eyes, maybe the function in ledger should return Option<(TransactionBalancesSet, TransactionTokenBalancesSet)> instead, and its callers can create four vec![vec![]; batch.len()]

in the worst case callers would have to load the accounts to fill the native balance vecs. i really dont know whether anything on the TransactionStatusReceiver side would need balances for an entirely discarded batch though. im leaning toward the "vec of empty vecs" option but curious to hear your thoughts

Comment on lines +396 to 398

let (mut load_us, mut execution_us): (u64, u64) = (0, 0);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its only tangentially related to this pr, but instead of adding all timings at the bottom, i switched to adding as many timings as possible immediately after they are generated, and keeping load_us and execution_us as incremental accumulators because theyre printed in a debug message

this is because with collect_balances_us we would have had four mut u64 here, and if the program cache fix (filter account owners) ever lands we would have seven. it doesnt make much sense to keep this pattern, since we are just saturating-adding to a stack variable that is then later saturating-added to the timing struct anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants