Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If you don't have a loss, the derivative of what would you get ? #2

Open
XinshaoAmosWang opened this issue Jun 16, 2020 · 0 comments
Open

Comments

@XinshaoAmosWang
Copy link
Owner

XinshaoAmosWang commented Jun 16, 2020

https://www.reddit.com/r/MachineLearning/comments/ha0k5u/r_source_codes_for_general_examplelevel_weighting/fv1av1l?utm_source=share&utm_medium=web2x

Deep learning models are optimised by SGD.
1. Is a loss function necessary for deriving the gradient used for back-propagation?
2. What is the righter way to weight training data points?
3. When a training set contains a higher label noise rate, we should focus on easier training examples for better generalisation!

Code releasing: https://xinshaoamoswang.github.io/blogs/2020-06-14-code-releasing/

For a defined optimisation objective, e.g., maximising p(y|x) towards one, we can have so many loss functions, e.g., absolute error |1-p(y|x)|, square error (1-p(y|x))^2.

Here, 1st item means that there is no need to care the format of loss functions. Instead, all we need is to design the gradient directly to optimise the defined ultimate optimisation objective.

We discussed that it is more intuitive and can be easily interpreted from the angle of example weighting (2nd item).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant