Capsule Networks: Basic Principles and Benefits

By Kevin Stephen, Intern at AlgoAnalytics

In 1998, Yann LeCun and Yoshua Bengio introduced what is now one of the most popular models in Deep Learning, “Convolutional Neural Networks”[1](CNNs). A simple grid-like topology can help solve time-series problems using a 1D convolution or can work on image data treating it as a 2D grid. Fast forward to 2017, and here stands Geoffrey Hinton with his own improved version of Convolutional Networks, “Capsule Networks”[2]. A simple question arises, “Why replace something that already works so well?”. Well, maybe CNNs don’t really work as well as they could. Here are a few reasons why:

Reason 1: The basic working of Convolutional Neural Networks is that each convolutional layer extracts features sequentially. The initial layers extract the most blatant features and as the model progresses it learns to extract more features. As a result, convolutional layers do not encode spatial relations.

Figure 1: CNNs would classify both as faces due to lack of spatial information

Fig 5. Dynamic Routing Algorithm [2]

Fig 6. SegCaps architecture[4]

Fig 7. Results comparison

Table 1. Comparison of results between SegCaps and U-Net
 

This post first appeared in Medium