Document Processing Service custom parameters
You can optimize how your application runs the Document Processing Service (DPS)
component by specifying profiles for the optical character recognition (OCR) and
highlighting services. In addition, you can edit additional DPS parameters to customize
this component further. Modifying custom parameters for DPS helps you optimize the OCR
and highlighting services for your needs and requirements.
For example, you
can modify custom DPS parameters so that the system automatically corrects layout
orientation during document processing, and recognizes text in image-based documents in the
English, French, and Spanish languages.
Main parameters
The following table lists the main DPS parameters that you modify in the configureDPSABBYY data transform:
Parameter name | Description | Values |
---|---|---|
exactMatch | Specifies whether to perform exact matching of text during document processing. | true, false |
highlightFileExportFormat | Specifies the file format for the highlighting service. | FEF_RTF, FEF_HTMLVersion10Defaults, FEF_HTMLUnicodeDefaults, FEF_PDF, FEF_TextVersion10Defaults, FEF_TextUnicodeDefaults, FEF_XML, FEF_DOCX, FEF_XLSX, FEF_PPTX, FEF_ALTO, FEF_EPUB, FEF_FB2, FEF_ODT |
ocrFileExportFormat | Specifies the file format for the OCR service. | FEF_RTF, FEF_HTMLVersion10Defaults, FEF_HTMLUnicodeDefaults, FEF_PDF, FEF_TextVersion10Defaults, FEF_TextUnicodeDefaults, FEF_XML, FEF_DOCX, FEF_XLSX, FEF_PPTX, FEF_ALTO, FEF_EPUB, FEF_FB2, FEF_ODT |
textLanguage | Specifies the languages of the text to be recognized, including programming languages, as a comma-separated list. Define up to three languages for this property, as more language definitions may impact system performance. | Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altaic, Arabic, ArmenianEast, ArmenianGrabar, ArmenianWestern, Anwar, Aymara, AzeriCyrillic, AzericLatin, Bashkir, Basic, Basque, Belarusian, Bemba, Blackfoot, Breton, Bugotu, Bulgarian, Burmese, Buryat, C++, Catalan, Chamorro, Chechen, Chemistry, ChinesePRC, ChineseTaiwan, Chukcha, Chuvash, CMC7, Cobol, Corsican, CrimeanTatar, Croatian, Crow, Czech, Danish, Dargwa, Digits, Dungan, Dutch, DutchBelgian, E13B, English, EskimoCyrillic, EskimoLatin, Esperanto, Estonian, Even, Evenki, Faeroese, Farsi, Fijian, Finnish, Fortan, French, Frisian, Friulian, GaelicScottish, Gagauz, Galician, Ganda, German, GermanLuxembourg, GermanNewSpelling, Greek, Guarani, Hani, Hausa, Hawaiian, Hebrew, Hungarian, Icelandic, Ido, Indonesian, Ingush, Interlingua, Irish, Italian, Japanese, JapaneseModern, Java, Kabardian, Kalmyk, KarachayBalkar, Karakalpak, Kasub, Kawa, Kazakh, Khakas, Khanty, Kikuyu, Kirghiz, Kongo, Korean, KoreanHangul, Koryak, Kpelle, Kumyk, Kurdish, Lak, Lappish, Latin, Latvian, LatvianGothic, Lezgin, Lithuanian, Luba, Macedonian, Malagasy, Malay, Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minankabaw, Mohawk, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Norwegian, NorwegianBokmal, NorwegianNynorsk, Nyanja, Occidental, OcrA, OcrB, Ojibway, OldEnglish, OldFrench, OldGerman, OldItalian, OldSlavonic, OldSpanish, Ossetic, Papiamento, Pascal, Pashto, PidginEnglish, Polish, PortugueseBrazilian, PortugueseStandard, Provencal, Quechua, RhaetoRomanic, Romanian, RomanianMoldavia, Romany, Ruanda, Rundi, RussianOldSpelling, Russian, RussianWithAccent, Samoan, Selkup, SerbianCyrillic, SerbianLatin, Shona, Sioux, Slovak, Slovenian, Somali, Sorbian, Sotho, Spanish, Sunda, Swahili, Swazi, Swedish, Tabassaran, Tagalog, Tahitian, Tajik, Tatar, Thai, Tinpo, Tongan, Tswana, Tun, Turkish, Turkmen, TurkmenLatin, Tuvin, Udmurt, UighurCyrillic, UighurLatin, Ukrainian, Urdu, UzbekCyrillic, UzbekLatin, Vietnamese, Visayan, Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, Zulu |
Page preprocessing parameters
The table below lists the DPS parameters that the system uses during page preprocessing, which you can modify in the configureDPSABBYY data transform:
Parameter name | Description | Values |
---|---|---|
applySigmaFilter | Specifies whether to apply the noise reduction filter to the image during page preprocessing. If you specify TSPV_Auto, the system determines automatically whether to use the noise reduction filter. | TSPV_Yes, TSPV_No, TSPV_Auto |
correctOrientation | Specifies whether to automatically rotate the image during page preprocessing, if the detected page orientation is different from normal. | VARIANT_TRUE, VARIANT_FALSE |
correctInvertedImage | Specifies whether to automatically invert the image, if the detected image is inverted - white text is displayed on a black background. | VARIANT_TRUE, VARIANT_FALSE |
correctResolution | Specifies whether to correct the resolution of the image during page preprocessing. If you specify TSPV_Auto, the system chooses to automatically correct the image resolution, if the system finds the resolution to be insufficient. | TSPV_Yes, TSPV_No, TSPV_Auto |
correctShadowsAndHighlights | Specifies whether to improve the recognition quality, by correcting excessive shadows and highlighting in the image during page preprocessing. Use this property with photo images. If you specify TSPV_Auto, the system automatically determines whether to perform correction of excessive shadows and highlighting. | TSPV_Yes, TSPV_No, TSPV_Auto |
correctSkew | Specifies whether image skew is corrected during page preprocessing. If you specify TSPV_Auto, the system automatically determines whether to perform image skew correction. | TSPV_Yes, TSPV_No, TSPV_Auto |
correctGeometry | Specifies whether geometrical distortions in photo images are removed during page processing. If you specify TSPV_Auto, the system automatically determines whether to remove geometrical distortions in photo images. | TSPV_Yes, TSPV_No, TSPV_Auto |
cropImage | Specifies whether document edges are detected in the image, and whether the image is cropped during page processing. If you specify TSPV_Auto, the system automatically determines whether to detect edges and crop the image. | TSPV_Yes, TSPV_No, TSPV_Auto |
PDF export parameters
The table below lists the DPS parameters that the system uses for PDF exporting, which you can modify in the configureDPSABBYY data transform:
Parameter name | Description | Values |
---|---|---|
jpgQuality | Specifies the image quality for saving in PDF files, as a percentage. | A number between 1 and 100. |
pdfExportScenario | Specifies the scenario used to export images to a PDF format that is balanced or based on quality and size, versus the speed of the operation. | PES_MaxQuality, PES_Balanced, PES_MinSize, PES_MaxSpeed |
Object extraction parameters
The table below lists the DPS parameters that the system uses during object extraction, which you can modify in the configureDPSABBYY data transform:
Parameter name | Description | Values |
---|---|---|
sourceContentReuseMode | Specifies how the text and image layers of the source PDF file are handled: automatic, with content only, or do not reuse. | CRM_Auto, CRM_DoNotReuse, CRM_ContentOnly |