Skip to content

subsitute vector<double> for map<PeakType, double> in SignalToNoiseEstimator #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

xchen98
Copy link

@xchen98 xchen98 commented May 27, 2020

It coud be found that the type pf data, map<PeakType, doule>, was used in SignalToNoiseEstimator of Tool OpenSwathWorkflow, in which the values corresponded to the iteration of map are storaged. The stored values are synchronously updated contiguously with the iteration, using the function named getSignalToNoise. However, vector could be more efficient when processing large amounts of data. Thus, vectors are applied here. From the diagram shown below, it could be easily found that the processing speed differs a lot while using vector and map.(left: map | right: vector)
runtime of OpenSwathWorkflow with map: 17:31 m
runtime of OpenSwathWorkflow with vector: 15:38m
Signal_Vergleiche


virtual double getSignalToNoise(const PeakType & data_point)
///Return to signal/noise estimate for date point @p index
///@note the first query to this function will taake longer, as
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taake ? :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops, thanks.


if (write_debuginfo)
{
std::cerr << first_it->getPrecursor().getMZ() << " " << first_it->getProduct().getMZ() << " ";
}

PeakSpectrum::FloatDataArray signal_to_noise;
for (PeakSpectrum::Iterator sit = chromatogram.begin(); sit != chromatogram.end(); ++sit)
for (Size sit = 0; sit < chromatogram.size(); ++sit)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename sit to i, because its not an iterator anymore

PeakIterator last_;
std::vector<double> stn_estimates_;
/// raw data;
Container c;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems very wasteful to copy the whole underlying data!

Can you try and remove this member entirely?!
The user calls .init(container) and there we can compute the S/N values and store them in stn_estimates. There should be no need to store the container as well.
The only thing that does not work anymore is the is_result_valid_ idea. But that should not really be required and can be removed as well.

This should make the code quite a bit faster...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code, you can simply remove the is_result_valid_ member, and within SignalToNoiseEstimator::update() simply do stn_estimates_.clear();, this way, if someone sets new parameters and forgets to call .init() afterwards, we will see a segfault and thus the error is clear. You can also add a OPENMS_PRECONDITION to check the index is within the vectors size in getSignalToNoise(), so in debug builds we will get a proper error message and not just a segfault.

Copy link
Owner

@cbielow cbielow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. only the container member seems overkill.
Removing it should give a speedup.
Please remeasure when done :)

}

///Return to signal/noise estimate for date point @p index
///@note the first query to this function will taake longer, as
///@note the first query to this function will take longer, as
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct anymore, right? Since everything is computed during the call to init().

init(c);
}

OPENMS_POSTCONDITION(condition,message);
Copy link
Owner

@cbielow cbielow May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should say OPENMS_POSTCONDITION(index < stn_estimates_.size(), "S/N estimate beyond container size was requested"); or similar

@cbielow
Copy link
Owner

cbielow commented Jun 5, 2020

ok. looks good.
Can you open a PR against OpenMS/develop please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants