-
Notifications
You must be signed in to change notification settings - Fork 877
Use of safe_indexing in StackingCVClassifier #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have this part (commit |
Hi there, I really like the refactoring overall! I am just a bit worried about that regarding other refactoring plans I really liked, i.e.,
Not sure if this is being continued though. Regarding replacing the arrays via Python lists, I'd actually prefer arrays over python lists, but I think both solutions are not ideal. I think we should know the exact size of the output array, and it would be more elegant/efficient to initialize an array of that size upfront and then just update it with the respective values. Regarding the |
The CV method then returns a bunch of randomized indices (probably not consecutive), which we use to index the incoming data. The CV method assumes that indices are between Oh yes, you're right about initializing it upfront! I didn't think of that. In fact, that's awesome because we can assign by row index and won't have to do the reshuffling later. Should I wait for the other refactoring effort, or would you like me to do up the lists implementation? Actually, I think some of the methods I refactored could be lifted into parent classes, the other effort is a great idea! |
Good point, I am actually not sure whether this other PR is still "active" and I would say we probably shouldn't wait :). And like you said, these two things are not orthogonal and the refactoring into parent classes would probably also benefit from your refactoring.
I think this would be the way to go then, instead of using the list of lists you mentioned above, if that's not too much work/hassle |
Also remove need to reorder labels when building meta features
@rasbt let me know if this is ok! I didn't remove the I'd have to wait till the first iteration before being able to get the number of classes to build the final array for the meta features, so the advantage of declaring the final array upfront is reduced. |
Hi, the refactorization #445 currently faces failure in unit tests and I have been away for a while. |
I think that's fine :). The function
is now so nice and simple due to the refactoring that it is very easy to read what's going on, which is a benefit as well. Later, when/if we refactored all stacking classes based on a BaseStacker, we can think about parallelizing this step. But for now, I think it is already a big improvement over the previous approach. Unless you have sth else you'd like to add, I'd be happy to merge. |
Please merge! Thanks so much! Thanks for maintaining mlxtend too! |
Description
Updated
StackingCVClassifier
to usesafe_indexing
fromsklearn
.Also did some refactoring: instead of iteratively concatenating the model predictions from each fold, I saved all the predictions in a list and concatenated them all at once. Not sure if the iterative concatenation was meant to save memory, but I thought it shouldn't really make a difference since the overhead of storing them all in a list should be quite small.
Related issues or pull requests
None! Was using this for Kaggle and found that it was breaking on fitting, so I added this fix!
Did some refactoring while fixing the code
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)nosetests ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,nosetests ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend