Configuring language detection preferences

You can control how a text analyzer detects languages in the analyzed document. For example, you can enable a fallback language in case your text analyzer does not detect the language when analyzing content that is written in multiple languages.
  1. In the Records panel, click Decision > Text Analyzer.
  2. Click the Advanced tab.
  3. Perform any of the following actions:
    • To automatically detect the language of a piece of text, go to step 4.

      This is the default option.

    • To manually the language of a piece of text, skip to step 5.

      You can use this option when analyzing documents that are written in multiple languages or contain a lot of noise that could interfere with language detection, such as emoticons, URLs, and so on.

    • For Twitter data sets only: To allow the source data set to detect the language of a piece of text, skip to step 6
      Important: If the source data set is of type Facebook or YouTube, and you select the Language detected by publisher option, your application will fall back to the Automatically detect language setting.
  4. To enable the language auto detection feature:
    1. In the Language settings section, select Automatically detect language.
    2. Optional: Select Enable fallback language if language undetected and specify the language that the system falls back to in case no language is detected.
    3. Use the language metadata tag ( lang: ) of the incoming records for language detection by selecting Language detected by publisher
    4. Go to step 7.
  5. To always assign a specific language to the analyzed text, perform the following actions:
    1. Select Specify language.
    2. Select a language from the drop-down list.
    3. Go to step 7.
  6. To use the language metadata tag (lang:) of the incoming records for language detection, perform the following actions:
    1. Select the Language detected by publisher radio button.
    2. Go to step 7.
  7. Click Save.