Skip to main content
When using language models (LMs) for text generation, one common challenge is repetition. Without guidance, a model may repeat words, phrases, or concepts excessively, especially in long-form outputs. This can reduce readability, coherence, and the overall usefulness of the generated text. To address this, most LLMs provide two complementary mechanisms: frequency penalty and presence penalty. Both of these parameters adjust the probabilities of tokens during sampling, encouraging the model to generate more diverse, creative, and contextually rich outputs.

Understanding Frequency Penalty

The frequency penalty discourages the model from repeating tokens that have already appeared in the generated text. Importantly, this penalty is proportional to how many times a token has occurred.
  • If a token appears once, the penalty slightly reduces its probability.
  • If it appears multiple times, the probability reduction becomes stronger.

Mathematical Intuition

A simplified formula for frequency penalty is: P_adjusted(token) = P(token) - (frequency_penalty × count(token)) Where:
  • P(token) = original probability of the token
  • frequency_penalty = user-defined weight (usually 0–2)
  • count(token) = number of times the token has already appeared
Think of frequency penalty as a progressively increasing deterrent against repetition. The more a token has been used, the less likely it is to appear again.

Use Cases for Frequency Penalty

  • Long-form content generation: Articles, reports, or essays where repeated words reduce readability.
  • Dialogue generation: Multi-turn conversations where the model may redundantly repeat phrases or instructions.
  • Structured outputs: Preventing repeated labels, keys, or values in generated lists or tables.

Understanding Presence Penalty

The presence penalty, in contrast, discourages the model from using tokens that have appeared at least once, regardless of frequency.
  • Even if a token has only appeared once, the presence penalty reduces its probability for subsequent selections.
  • It is ideal for encouraging the model to introduce new concepts and words.

Mathematical Intuition

A simplified formula for presence penalty is: P_adjusted(token) = P(token) - (presence_penalty × indicator(token_present)) Where:
  • indicator(token_present) = 1 if token has appeared, 0 otherwise
  • presence_penalty = user-defined weight (usually 0–2)
Presence penalty works as a binary gate — it discourages tokens that already exist in the text, making the model explore new vocabulary or ideas.

Use Cases for Presence Penalty

  • Creative writing: Stories, poetry, slogans, and marketing copy.
  • Brainstorming outputs: Generating multiple unique ideas in a single prompt.
  • Reducing monotony: Ensuring repeated words do not dominate multi-turn outputs.

Frequency vs Presence: Key Differences

ParameterMechanismEffect on RepetitionBest For
Frequency PenaltyPenalizes tokens proportional to occurrenceGradually reduces likelihood of repeated tokensLong-form text, structured outputs, multi-turn dialogues
Presence PenaltyPenalizes tokens once they appear at least onceDiscourages any reuse of tokens regardless of frequencyCreative writing, brainstorming, idea expansion, slogans
In practice, these penalties are often combined to control repetition while maintaining creativity.

Guidelines:
  1. Start with small penalties and test outputs incrementally.
  2. Combine with temperature (controls randomness) and top-p (limits sampling nucleus) for fine-grained control.
  3. Monitor outputs in production to adjust penalties dynamically based on content quality.

I