Understanding Rotary Positional Embeddings (RoPE)

Rotary Position Embedding (RoPE) is an advanced technique for encoding positional information within transformer-based language models. Unlike traditional positional embeddings that add or concatenate position vectors, RoPE introduces position by rotating the query and key vectors in multi-dimensional space. This geometric approach enables transformers to capture both absolute and relative positions more effectively, especially for long sequences. In this article, we’ll cover the motivation behind RoPE, its mathematical foundation, key advantages, and practical implementation. ...

August 17, 2025 · 7 min · 1349 words · Abhishek Kumar

Speculative Decoding for LLMs

Speculative Decoding Speculative decoding is an innovative optimization technique designed to accelerate the inference process in large language models (LLMs) without compromising the quality of the output. It achieves this by generating multiple tokens in parallel and incorporating a verification mechanism to ensure the correctness of these speculated tokens, thereby guaranteeing that the overall output is identical to that of vanilla decoding. This approach significantly reduces the cost of generative AI and increases its adoption by optimizing the cost of inference of LLMs. ...

August 9, 2025 · 6 min · 1109 words · Abhishek Kumar