Best practices when building rule-based entities in an IVA
To help Pega Intelligent Virtual Assistant (IVA) extract the correct information from a conversation, create rule-based entities by using Apache Ruta (Rule-based Text Annotation) scripts. Ruta is a rule-based script language with which you detect keywords and phrases that follow certain patterns. For example, when you build rule-based entities using Ruta scripts, the IVA detects information such as email addresses, so that the system can later populate this data in case properties.
The Ruta script lists entities based on rules that are a combination of annotation patterns, optional quantifiers, conditions for matching, and actions to perform. Ruta scripts are similar to regular expressions in that you define a pattern to compare against, so that you can obtain matching keywords and phrases. Examples of patterns that you can search for as rule-based entities include country names, postal codes, email addresses, phone numbers, and street names. To train an IVA to detect more abstract keywords and phrases in a conversation, train the data using model-based entities. For more information, see Best practices when training model-based entities in an IVA.
When building a rule-based entity, remember that a Ruta script can detect only a single entity type. To store annotation results, mark them in the Ruta script. You can use variables, for example, VarA or VarB, to store intermediate annotation results. Keep in mind that you can also define wordlists as keywords and refer to them in the Ruta script. For more information, see Best practices for pattern extraction in text analytics.
You define rule-based entity types with Ruta scripts that match a certain pattern, in Prediction Studio, during the creation of a new extraction model. To use this text analytics model in an IVA, you later select the model in the text analyzer rule for your channel. In addition, you can also create simple keyword-based entities when you create a new model. Use the keyword detection method when the entity type that you want to extract is an umbrella term for a finite number of associated terms or phrases that do not follow any specific pattern. For example, you can define and associate the city entity type with the keyword Chicago, and the keyword with such synonyms as Chi, The Loop, and The Windy City. For more information about creating rule-based and keyword-based entities in Prediction Studio, see Creating entity types.