Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus (2014)
- 2 key properties of neural networks (NNs):
- Semantic meaning of individual units:
- Earlier work analyzed the learned semantics by finding images that maximally activated individual units
- Authors observe there is no difference between individual units and random linear combinations of units
- So: it is the entire space of activations that contains the bulk of semantic information, not some individual units!
- Stability of NNs to small perturbations in input space:
- Networks that generalize well are expected to be robust to small perturbations in the input (imperceptible noise in the input shouldn't change the predicted class)
- Authors find that networks can be made to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error
- These "adversarial" examples generalize well to different architectures trained on different subsets of data
- Semantic meaning of individual units:
- Authors propose method to make networks more robust to small perturbations by training them with adversarial examples in an adaptive manner (changing the pool of adversarial examples during training)