Tokenizer Playground

Visualize how OpenAI models tokenize text. See token counts, boundaries, and IDs for GPT-4o, GPT-4, GPT-3.5, and more.

o200k_base
0
Tokens
0
Characters
0
Words
0
Avg Chars/Token

Input Text

Tokenized Output

Tokenized text will appear here...

About Tokenization

What are tokens?

Tokens are the basic units that language models process. A token can be a word, part of a word, or even punctuation. OpenAI models use Byte Pair Encoding (BPE) to break text into tokens.

Why does it matter?

Token count affects API pricing (you pay per token), context window limits, and model performance. Understanding tokenization helps optimize prompts and estimate costs.

Encodings

  • o200k_base — GPT-4o, o1, o3 series (200k vocabulary)
  • cl100k_base — GPT-4, GPT-3.5, embeddings (100k vocabulary)
  • p50k_base — text-davinci-003/002 (50k vocabulary)
  • r50k_base — Legacy davinci, curie, ada models