Configuring language detection preferences
You can control how a text analyzer detects languages in the analyzed document. For
example, you can enable a fallback language in case your text analyzer does not detect the
language when analyzing content that is written in multiple languages.
- In the Records panel, click .
- Click the Advanced tab.
-
Perform any of the following actions:
- To automatically detect the language of a piece of text, go to step 4.
This is the default option.
- To manually the language of a piece of text, skip to step 5.
You can use this option when analyzing documents that are written in multiple languages or contain a lot of noise that could interfere with language detection, such as emoticons, URLs, and so on.
- For Twitter data sets only: To allow the source data set to detect the language of
a piece of text, skip to step 6
Important: If the source data set is of type Facebook or YouTube, and you select the Language detected by publisher option, your application will fall back to the Automatically detect language setting.
- To automatically detect the language of a piece of text, go to step 4.
-
To enable the language auto detection feature:
- In the Language settings section, select Automatically detect language.
- Optional: Select Enable fallback language if language undetected and specify the language that the system falls back to in case no language is detected.
-
Use the language metadata tag (
lang:
) of the incoming records for language detection by selecting Language detected by publisher - Go to step 7.
-
To always assign a specific language to the analyzed text, perform the following
actions:
-
To use the language metadata tag (lang:) of the incoming records for
language detection, perform the following actions:
- Select the Language detected by publisher radio button.
- Go to step 7.
- Click Save.