Transformers Well Explained

2 min readFeb 19, 2024

Everyone is talking about ChatGPT and Transformers. Transformers have been a significant leap for Natural Language Processing (NLP) research, and it’s crucial to comprehend them properly. However, the explanations available are not very effective. Many people skip over essential knowledge gaps to provide quick explanations and usage tips. Well, speed isn’t the priority here.

To address this, I’ve decided to write a series of concise articles explaining Transformers and the attention mechanism in a way that truly makes sense. I’ll be referring to the landmark paper “Attention is all you need” [1] and breaking down the main aspects of Transformers, focusing on one concept per article. Each concept will be supported by a notebook for a hands-on understanding.

Any neural network architecture boils down to three main components: input, structure, and output. If you don’t understand these, things can feel non-intuitive, exotic, and unfamiliar. Here’s an overview of the articles:

1. Transformer Well Explained: Word Embeddings
2. Transformers Well Explained: Masking
3. Transformers Well Explained: Positional Encoding
4. Transformers Well Explained: Self Attention

For those looking to quickly implement Transformers, stop reading and import “torch.nn.MultiheadAttention” from PyTorch. Set up a couple of building blocks and enjoy yourself. These articles aim to provide a solid understanding, bridging the knowledge gaps and making Transformers more accessible.

[1] Attention is All You Need, Vaswani et. al., 2017

Transformers Well Explained

Written by Ahmad Mustapha

No responses yet