Transformers

Transformers have been quite a significant breakthrough in deep learning. They are unique in that they use self-attention mechanisms. They learn which parts of the inputs are the most important. They are also the most dominant in NLP and now expanding into other domains. Transformers are made up of an encoder and decoder, the encoder will take in the input sequence and process it into a form that can be used by the transformer. The decoder in the transformer then focuses on the relevant data and generates an output. The layers in the network contain a multi-head self-attention mechanism which rates the relevance of the elements and a feedforward neural network that process these elements further. The parallel processing in these transformers allows for efficient training on large data sets while using GPU’s. This technology has made it possible for Language Processing techniques like translation, summarization, chatbot and GPT models. They are also increasing used for computer vision tasks

Back to Index
Previous: Recurrent Neural Networks


Topics

Introduction | FFNs | CNNs | RNNs | Transformers | Index