Tokenizer Playground
Visualize how OpenAI models tokenize text. See token counts, boundaries, and IDs for GPT-4o, GPT-4, GPT-3.5, and more.
o200k_base
Loading encoder...
0
Tokens
0
Characters
0
Words
0
Avg Chars/Token
Input Text
Tokenized Output
Tokenized text will appear here...
About Tokenization
What are tokens?
Tokens are the basic units that language models process. A token can be a word, part of a word, or even punctuation. OpenAI models use Byte Pair Encoding (BPE) to break text into tokens.
Why does it matter?
Token count affects API pricing (you pay per token), context window limits, and model performance. Understanding tokenization helps optimize prompts and estimate costs.
Encodings
- o200k_base — GPT-4o, o1, o3 series (200k vocabulary)
- cl100k_base — GPT-4, GPT-3.5, embeddings (100k vocabulary)
- p50k_base — text-davinci-003/002 (50k vocabulary)
- r50k_base — Legacy davinci, curie, ada models