Start a conversation

Searching for 2+ characters in Chinese does not return expected results

Versions / Builds Affected

2015 GOLD and older

Status

Resolved

Problem Summary

This is a word breaker issue in regard of Chinese emails / data. The word breaker does not properly split characters / words which results in searches for 2 or more characters not returning expected results.

TT / JIRAID

14

How to Identify

In Chinese words are not necessarily delimited by characters like a blank, a coma or a full stop. Some words consist of multiple characters. E.g. a word could be formed out of: ChineseCharacterAChineseCharacterB One would expect searching for above to return an email which contains it. This unfortunately does not work always. E.g.: Search for 'ChineseCharacterA' > returns all matching ChineseCharacterA entries Search for 'ChineseCharacterAChineseCharacterB > returns no entries (but it should) Search for '*ChineseCharacterAChineseCharacterB*'> returns all matching 'ChineseCharacterAChineseCharacterB' entries (workaround)

Workaround / Fix Details

1. Upgrade to 2015 SR1 build 20150218 or newer 2. Go through article: 9541 Searching in Chinese, Japanese or Korean (CJK) languages returns inconsistent results http://www.gfi.com/support/products/gfi-archiver/Searching-in-Chinese-Japanese-or-Korean-CJK-languages-returns-inconsistent-results

Required Actions

Fix as outlined above
Choose files or drag and drop files
Was this article helpful?
Yes
No
  1. Priyanka Bhotika

  2. Posted

Comments