Word breaking is the breaking down of text into individual text tokens or words. Many languages, especially those with Roman alphabets, have an array of word separators (such as white space and punctuation) used to distinguish words, phrases and sentences. Word breakers must rely on accurate language heuristics to provide reliable and accurate results.
Word breaking is more complex for character-based systems of writing or script-based alphabets, where the meaning of individual characters is determined from context.
A "Word Breaker" is vital for the proper indexing of the most of the Asian languages (for example Japanese, Chinese and Arabic) and other languages.