Topics Topics

02 Things you need to know
Server
Using the Verity Keyview filter for indexing binary attachments

Beginning with release 5.010, Domino for iSeries will now use the Verity Keyview filter for retrieving text from binary attachments. Supported formats include Acrobat PDF, Word, WordPerfect, 1-2-3, Excel, Freelance, PowerPoint, HTML, and many others.
Note: WordPro attachments cannot be indexed.

To enable this feature, a database has to be full text indexed with the option to index binary attachments. Indexing attachments as raw text will continue to operate as before, and does not use the Keyview filter. The Keyview filter can be enabled or disabled for the entire server by specifying the notes.ini setting of FT_BINARY_FILTER_OFF. A value of FT_BINARY_FILTER_OFF=1 indicates Keyview filter will not be used, and is the default value for existing servers. A value of FT_BINARY_FILTER_OFF=0, or no setting present in the notes.ini file, indicates that Keyview filter will be used, and is the default setting for new servers configured with release 5.0.10. This notes.ini setting allows administrators to shut off the filter without having to change the indexing options on all databases that have the binary attachment option turned on.

When indexing binary attachments, the source text is converted from the character set of the attachment to LMBCS. There may be times when the character set of the text in the attachment is not known. In these situations an assumption will be made as to what the "default" character set of the text is. The locale of the QNOTES USRPRF will be used to make this determination. The default character set may also be indicated by specifying a notes.ini entry of OS400_KEYVIEW_CSID=CharacterSetVal. The values allowed for CharacterSetVal can be found in the documents titled "Setting up Domino on Linux for non 'C' locales" and "Setting collation on the Domino server" located in the core Domino release notes, Chapter 2, Section 6.


See below for a mapping of the QNOTES locale to default character set if a notes.ini entry does not exist.


Example of specifying a notes.ini entry ;
OS400_KEYVIEW_CSID=0052 to assume untagged documents are 1252.
Comments

Table 1 QNOTES locale mapping to a default document character set.
QNOTES LocaleRegion / LanguageDefault Character Set
    AR_AA
Arabic Speaking countries/Arabic
    IBMCP1256
    BG_BG
Bulgaria/Bulgarian
    IBMCP1251
    CS_CZ
Czech Republic/Czech
    IBMCP1250
    DA_DK
Denmark/Danish
    IBMCP1252
    DE_CH
Switzerland/German
    IBMCP1252
    DE_DE
Germany/German
    IBMCP1252
    EL_GR
Greece/Greek
    IBMCP1253
    EN_BE
Belgium/English
    IBMCP1252
    EN_GB
Great Britain/English
    IBMCP1252
    EN_US
USA/English
    IBMCP1252
    ES_ES
Spain/Spanish
    IBMCP1252
    ET_EE
Estonia/Estonian
    IBMCP1257
    FI_FI
Finland/Finnish
    IBMCP1252
    FR_BE
Belgium/French
    IBMCP1252
    FR_CA
Canada/French
    IBMCP1252
    FR_CH
Switzerland/French
    IBMCP1252
    FR_FR
France/French
    IBMCP1252
    HR_HR
Croatia/Croatian
    IBMCP1250
    HU_HU
Hungary/Hungarian
    IBMCP1250
    IS_IS
Iceland/Icelandic
    IBMCP1252
    IT_IT
Italy/Italian
    IBMCP1252
    IW_IL
Israel/Hebrew
    IBMCP1255
    JA_JP5035
Japan/Japanese
    IBMCP932
    KO_KR
S.Korea/Korean
    IBMCP949
    LT_LT
Lithuania/Lithuanian
    IBMCP1257
    LV_LV
Latvia/Latvian
    IBMCP1257
    MK_MK
Macedonia/Macedonian
    IBMCP1250
    NL_BE
Belgium/Dutch
    IBMCP1252
    NL_NL
Netherlands/Dutch
    IBMCP1252
    NO_NO
Norway/Norwegian
    IBMCP1252
    PL_PL
Poland/Polish
    IBMCP1250
    PT_BR
Brazilian Portuguese
    IBMCP1252
    PT_PT
Portugal/Portuguese
    IBMCP1252
    RO_RO
Romania/Romanian
    IBMCP1250
    RU_RU
Russia/Russian
    IBMCP1251
    SH_SP
Serbia/Serbian,Latin
    IBMCP1250
    SK_SK
Slovak/Slovakian
    IBMCP1250
    SL_SI
Slovene/Slovenian
    IBMCP1250
    SQ_AL
Albania/Albanian
    IBMCP1250
    SR_SP
Serbia/Serbain,Cyrillic
    IBMCP1251
    SV_SE
Sweden/Swedish
    IBMCP1252
    TH_TH
Thailand/Thai
    IBMCP874
    TR_TR
Turkey/Turkish
    IBMCP1254
    ZH_CN
China/Simplified Chinese
    IBMCP936
    ZH_TW
Taiwan/Mandarin,Traditional Chinese
    IBMCP950