Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus (2014)

Key points

2 key properties of neural networks (NNs):
1. Semantic meaning of individual units:
  - Earlier work analyzed the learned semantics by finding images that maximally activated individual units
  - Authors observe there is no difference between individual units and random linear combinations of units
  - So: it is the entire space of activations that contains the bulk of semantic information, not some individual units!
2. Stability of NNs to small perturbations in input space:
  - Networks that generalize well are expected to be robust to small perturbations in the input (imperceptible noise in the input shouldn't change the predicted class)
  - Authors find that networks can be made to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error
  - These "adversarial" examples generalize well to different architectures trained on different subsets of data
Authors propose method to make networks more robust to small perturbations by training them with adversarial examples in an adaptive manner (changing the pool of adversarial examples during training)