API Reference
count_tokens
count_tokens_in_file(file_path, encoding_name='cl100k_base', approximate=None, tokens_per_word=TOKENS_PER_WORD, characters_per_token=CHARACTERS_PER_TOKEN)
Return the number of tokens in a text file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
The path to the text file to count the tokens in. |
required |
encoding_name
|
str
|
The name of the encoding to use. Default: cl100k_base |
'cl100k_base'
|
approximate
|
str | None
|
Approximate the number of tokens without tokenizing. Base on: w - words, c - characters |
None
|
tokens_per_word
|
float
|
The number of tokens per word for word-based approximation. Default: 4/3 |
TOKENS_PER_WORD
|
characters_per_token
|
float
|
The number of characters per token for character-based approximation. Default: 4 |
CHARACTERS_PER_TOKEN
|
Returns:
| Type | Description |
|---|---|
int
|
The number of tokens in the text file. |
Source code in src/count_tokens/count.py
count_tokens_in_string(string, encoding_name='cl100k_base')
Return the number of tokens in a text string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string
|
str
|
The text string to count the tokens in. |
required |
encoding_name
|
str
|
The name of the encoding to use. Default: cl100k_base |
'cl100k_base'
|
Returns:
| Type | Description |
|---|---|
int
|
The number of tokens in the text string. |