A string to be tokenized.
An array of tokens, where each token is a normalized and classified segment of the input text.
import { tokenize, TokenKind } from 'charabia-js';
import assert from 'node:assert';
const tokens = tokenize(
'The quick ("brown") fox can\'t jump 32.3 feet, right? Brr, it\'s 29.3°F'
);
let token = tokens[0];
assert.equal(token.lemma, 'the');
assert.equal(token.kind, TokenKind.Word);
token = tokens[1];
assert.equal(token.lemma, ' ');
assert.equal(token.kind, TokenKind.SoftSeparator);
token = tokens[2];
assert.equal(token.lemma, 'quick');
assert.equal(token.kind, TokenKind.Word);
Segments the provided text into tokens, then normalizes and classifies each token.