Skip to main content


         This documentation site is for previous versions. Visit our new documentation site for current releases.      
 

PdfConnector properties, methods, and events

Updated on October 19, 2022

Use the PdfConnector component in an automation to search for text, extract text and images, and annotate PDF files.

You cannot use the PdfConnector component to view, create, or modify PDF files, other than to annotate them. You can use this component to process PDF files with or without user interaction. For instance, you can use this component to process a PDF file before presenting it to the Robot Runtime user.

Use this component with the PdfViewer component to bring annotations and highlights into the Runtime viewer’s attention.

Note: In PDF files, left and right coordinates are offsets from the left of the page. Bottom and top coordinates are offsets from the bottom of the page.

The following tables list the descriptions for the PdfConnector properties, methods, and events:

Properties

PropertyDescription
AnnotationCount(Read-only) Displays the number of annotations in the document.
AutoSaveEnabledEnter True if changes to a PDF file should automatically be saved when the PDF file is closed, otherwise False.
DetectedDocumentType(Read-only) Displays the DocumentType object for the currently loaded document.
DetectedDocumentTypeNameRead-only) Displays the name of the currently loaded document type.
FileNameSpecifies the name of the PDF file.
LineCount(Read-only) Displays the number of lines in the PDF file.
LineThreshold

The system compares the amount of white space between these points when comparing two pieces of text to determine if the text is on the same line:

  • The amount of white space above the top of the line
  • The amount of white space below the bottom of the line

Your entry in this property sets the threshold. If the white space is less than or equal to your entry, the system considers the text to be on the same line. If it is more than your entry, it considers the text to be on different lines.

The default is 2.0 points, with a point being equal to 1/72 of an inch.

HasFormFieldsSpecifies an open PDF file that has form fields that can be written to.
HasSaved(Read-only) Indicates that the PDF file has been saved.
IsDocXfaFormat(Read-only) Indicates if the opened PDF file is in XFA (XML Forms Architecture) format. Support for XFA format PDF files is limited. You can use the PdfViewer component to display XFA-formatted files but you cannot edit them.
ImageCount(Read-only) Indicates the number of images you can extract from the document.
IsOpen(Read-only) Indicates if the PDF file has been successfully opened.
OutputNameSpecifies the file name to assign to the output PDF file. Be sure to specify an output file name to avoid overwriting the original PDF during the design phase.
Pages(Read-only) Provides a list of the PdfPage objects. These objects represent the pages in the document.
PageCount(Read-only) Indicates the number of pages in the document.
SegmentCount(Read-only) Displays the number of segments in the PDF file.
SegmentThreshold

The system compares the amount of white space between these points when comparing two pieces of text to determine if they are part of the same segment of text:

  • The amount of white space above the top of the segment
  • The amount of white space below the bottom of the segment

Your entry in this property sets the threshold. If the white space is less than or equal to your entry, the system considers the text to be part of the same segment. If it is more than your entry, it considers the text to be in different segments.

The default is 10 points, with a point being equal to 1/72 of an inch.

TableCount(Read-only) Displays the number of tables in the PDF file.
Text(Read-only) Returns all of the text in the document as a single value. The system omits comments and annotation text.
WordCount(Read-only) Displays the number of words in the PDF file.
WordThreshold

The system looks at the amount of white space between pieces of text to determine if the text comprises a single word or if the white space indicates there are two words.

Your entry in this property sets the threshold. If the space is less than or equal to your entry, the system considers the text to part of the same word. If it is more than your entry, it considers the text to be different words.

The default is 2.2 points, with a point being equal to 1/72 of an inch.

Methods

MethodDescriptionReturn Type
Annotate(AnnotationType typ, int pg, string tx, float lf, float rt, float tp, float bt, Color clr) Adds an annotation based on the position you specify, such as float left, right, top, or bottom. Boolean
Annotate(PdfLine line, AnnotationType type, string annotationText, Color color) Adds an annotation based on the ordinal line number you specify. Boolean
Annotate(PdfSegment segment, AnnotationType type, string annotationText, Color color) Adds an annotation based on the segment you specify. Boolean
Annotate(PdfWord word, AnnotationType type, string annotationText, Color color) Adds an annotation based on the word you specify. Boolean
Annotate(PdfPhrase phrase, AnnotationType type, string annotationText, Color color) Adds an annotation based on the phrase you specify. Boolean
AppendPages[FiletoAppend] Appends the PDF file that you specify to the currently loaded PDF file. Boolean
AppendPages[fileToAppend, inputFileName] Appends the PDF file that you specify using the fileToAppend parameter to the target PDF file, specified with the inputFileName parameter. Boolean
Close() Closes a PDF file. Boolean
ConcatFiles[outputFileName, files]

Combines the PDF files that you specify into a single PDF file.

You can identify the files with a string array, by typing the file names into the method block, or with a comma-separated list of the individual file names.

Boolean
CombineTables(DataTable inTable1, DataTable inTable2, out DataTable outTable) Combines two data tables into a single table. The tables are not required to have the same schema. Column names are not retained. Boolean
DeleteAnnotation(PdfAnnotation annotation) Deletes the annotation you specify. Boolean
ExtractPages[outputFileName, singleFile, pageList]

Extracts pages from the currently loaded PDF file and saves those pages as one or more PDF files, using the name that you specify in the outputFileName parameter.

singleFile - Enter True to combine all extracted pages into a single PDF file, specified in the outputFileName parameter. Enter False to create a separate PDF file for each extracted page. Each PDF file is appended with -Page{pagenumber}.

pageList - List the pages that you want to extract, separated by commas. You can also specify a range of pages. The following is an example: 1,23,6-8,10.

Boolean
ExtractPages[inputFileName, outputFileName, singleFile, pageList]

Extracts pages from the PDF file that you specify in the inputFileName parameter and saves those pages as one or more PDF files, using the name that you specify in the outputFileName parameter.

singleFile - Enter True to combine all extracted pages into a single PDF file, specified in the outputFileName parameter. Enter False to create a separate PDF file for each extracted page. Each PDF file is appended with -Page{pagenumber}.

pageList - List the pages that you want to extract, separated by commas. You can also specify a range of pages. The following is an example: 1,23,6-8,10.

Boolean
ExtractPagesWithText[outputFileName, singleFile, textToFind, adjacent, pagesBefore, pagesAfter]

Extracts pages from the currently loaded PDF file that contain the text that you specify. This method then saves those pages as one or more PDF files, using the name that you specify in the outputFileName parameter.

singleFile - Enter True to combine all extracted pages into a single PDF file, specified in the outputFileName parameter. Enter False to create a separate PDF file for each extracted page. Each PDF file is appended with -Page{pagenumber}.

textToFind - Enter the text that you want the system to find as it identifies the pages to extract. Case does not matter.

pagesBefore - Specify the number of pages before the textToFind to identify the page that you want to extract. For example, if the system finds the textToFind on page 5 and you enter 3, the system extracts pages starting with page 2.

pagesAfter - Specify the number of pages after the textToFind to identify the page that you want to extract. For example, if the system finds the textToFind on page 5 and you enter 3, the system extracts pages 6- 8.

Boolean
ExtractPagesWithText[inputFileName, outputFileName, singleFile, textToFind, adjacent, pagesBefore, pagesAfter]

Extracts pages from the PDF file that you specify in the inputFileName parameter that contain the text that you specify. This method then saves those pages as one or more PDF files, using the name that you specify in the outputFileName parameter.

singleFile - Enter True to combine all extracted pages into a single PDF file, specified in the outputFileName parameter. Enter False to create a separate PDF file for each extracted page. Each PDF file is appended with -Page{pagenumber}.

textToFind - Enter the text that you want the system to find as it identifies the pages to extract. Case does not matter.

pagesBefore - Specify the number of pages before the textToFind to identify the page that you want to extract. For example, if the system finds the textToFind on page 5 and you enter 3, the system extracts 2.

pagesAfter - Specify the number of pages after the textToFind to identify the page that you want to extract. For example, if the system finds the textToFind on page 5 and you enter 3, the system extracts pages 6- 8.

Boolean
FindPage(string searchFor, out int pageNumber) Finds the first page that contains the text you specify. Boolean
FindPage(string searchFor, int startPage, out int pageNumber) Finds the first page that contains the text you specify. The system starts the search on the page number you specify. Boolean
FindPage(string searchFor, int startPage, int endPage, out int pageNumber) Finds the first page that contains the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindPage(string searchFor, PdfLine searchAfter, out int pageNumber) Finds the first page that contains the text you specify. The system starts the search after the line you specify. Boolean
FindPage(string searchFor, PdfSegment searchAfter, out int pageNumber) Finds the first page that contains the text you specify. The system starts the search after the segment you specify. Boolean
FindPage(string searchFor, PdfWord searchAfter, out int pageNumber) Finds the first page that contains the text you specify. The system starts the search after the word you specify. Boolean
FindPages(string searchFor, out int[] pageNumbers) Finds all pages that contain the text you specify. Boolean
FindPages(string searchFor, int startPage, out int[] pageNumbers) Finds all pages that contain the text you specify. The system starts the search at the page number you specify. Boolean
FindPages(string searchFor, int startPage, int endPage, out int[] pageNumbers) Finds all pages that contain the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindPages(string searchFor, PdfLine searchAfter, out int[] pageNumbers) Finds all pages that contain the text you specify. The system starts the search after the line you specify. Boolean
FindPages(string searchFor, PdfSegment searchAfter, out int[] pageNumbers) Finds all pages that contain the text you specify text. The system starts the search after the segment you specify. Boolean
FindPages(string searchFor, PdfWord searchAfter, out int[] pageNumbers) Finds all pages that contain the text you specify. The system starts the search after the word you specify. Boolean
FindLine(string searchFor, out PdfLine line) Finds the first line that contains the text you specify. Boolean
FindLine(string searchFor, int startPage, out PdfLine line) Finds the first line that contains the text you specify, starting the search at the page number you specify. Boolean
FindLine(string searchFor, int startPage, int endPage, out PdfLine line) Finds the first line that contains the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindLine(string searchFor, PdfLine searchAfter, out PdfLine line) Finds the first line that contains the text you specify. The system starts the search after the line you specify. Boolean
FindLine(string searchFor, PdfSegment searchAfter, out PdfLine line) Finds the first line that contains the text you specify. The system starts the search after the segment you specify. Boolean
FindLine(string searchFor, PdfWord searchAfter, out PdfLine line) Finds the first line that contains the text you specify. The system starts the search after the word you specify. Boolean
FindLines(string searchFor, out PdfLine[] lines) Finds all of the lines that contain the text you specify. Boolean
FindLines(string searchFor, int startPage, out PdfLine[] lines) Finds all of the lines that contain the text you specify, starting the search at the page number you specify. Boolean
FindLines(string searchFor, int startPage, int endPage, out PdfLine[] lines) Finds all of the lines that contain the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindLines(string searchFor, PdfLine searchAfter, out PdfLine[] lines) Finds all of the lines that contain the text you specify. The system starts the search after the line you specify. Boolean
FindLines(string searchFor, PdfSegment searchAfter, out PdfLine[] lines) Finds all of the lines that contain the text you specify. The system starts the search after the segment you specify. Boolean
FindLines(string searchFor, PdfWord searchAfter, out PdfLine[] lines) Finds all of the lines that contain the text you specify. The system starts the search after the word you specify. Boolean
FindPhrase(string searchFor, out PdfPhrase phrase) Finds the first occurrence of the text you specify. Boolean
FindPhrase(string searchFor, int startPage, out PdfPhrase phrase) Finds the first occurrence of the text you specify, starting the search at the page number you specify. Boolean
FindPhrase(string searchFor, int startPage, int endPage, out PdfPhrase phrase) Finds the first occurrence of the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindPhrases(string searchFor, out PdfPhrase[] phrases) Finds all occurrences of the text you specify. Boolean
FindPhrases(string searchFor, int startPage, out PdfPhrase[] phrases) Finds all occurrences of the text you specify, starting the search at the page number you specify. Boolean
FindPhrases(string searchFor, int startPage, int endPage, out PdfPhrase[] phrases) Finds all occurrences of the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindRelativeLine(string searchFor, int occurrence, int relativeLineOffset, out PdfLine line) Finds a specific occurrence of a line The system returns a line relative to the line the system finds. Boolean
FindRelativeSegment(string searchFor, int occur, int relSegOffset, out PdfSegment seg) Finds a specific occurrence of a segment. The system returns a segment relative to the segment it found. Boolean
FindSegment(string searchFor, out PdfSegment segment) Finds the first segment that contains the text you specify. Boolean
FindSegment(string searchFor, int startPage, out PdfSegment segment) Finds the first segment that contains the text you specify, starting the search at the page number you specify. Boolean
FindSegment(string searchFor, int startPage, int endPage, out PdfSegment segment) Finds the first segment that contains the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindSegment(string searchFor, PdfLine searchAfter, out PdfSegment segment) Finds the first segment that contains the text you specify. The system starts the search after the line you specify. Boolean
FindSegment(string searchFor, PdfSegment searchAfter, out PdfSegment segment) Finds the first segment that contains the text you specify. The system starts the search after the segment you specify. Boolean
FindSegment(string searchFor, PdfWord searchAfter, out PdfSegment segment) Finds the first segment that contains the text you specify. The system starts the search after the word you specify. Boolean
FindSegments(string searchFor, out PdfSegment[] segments) Finds all segments that contain the text you specify. Boolean
FindSegments(string searchFor, int startPage, out PdfSegment[] segments) Finds all segments that contain the text you specify, starting the search at the page number you specify. Boolean
FindSegments(string searchFor, int startPage, int endPage, out PdfSegment[] segments) Finds all segments that contain the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindSegments(string searchFor, PdfLine searchAfter, out PdfSegment[] segments) Finds all segments that contain the text you specify. The system starts the search after the line you specify. Boolean
FindSegments(string searchFor, PdfSegment searchAfter, out PdfSegment[] segments) Finds all segments that contain the text you specify. The system starts the search after the segment you specify. Boolean
FindSegments(string searchFor, PdfWord searchAfter, out PdfSegment[] segments) Finds all segments that contain the text you specify. The system starts the search after the word you specify. Boolean
FindWord(string searchFor, out PdfWord word) Finds the first word that contains the text you specify. Boolean
FindWord(string searchFor, int startPage, out PdfWord word) Finds the first word that contains the text you specify, starting the search at the page number you specify. Boolean
FindWord(string searchFor, int startPage, int endPage, out PdfWord word) Finds the first word that contains the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindWord(string searchFor, PdfLine searchAfter, out PdfWord word) Finds the first word that contains the text you specify. The system starts the search after the line you specify. Boolean
FindWord(string searchFor, PdfSegment searchAfter, out PdfWord word) Finds the first word that contains the text you specify. The system starts the search after the segment you specify. Boolean
FindWord(string searchFor, PdfWord searchAfter, out PdfWord word) Finds the first word that contains the text you specify. The system starts the search after the word you specify. Boolean
FindWords(string searchFor, out PdfWord[] words) Finds all words that contain the text you specify. Boolean
FindWords(string searchFor, int startPage, out PdfWord[] words) Finds all words that contain the text you specify. You must also specify the page numbers on which you want the search to start and end. Boolean
FindWords(string searchFor, int startPage, int endPage, out PdfWord[] words) Finds all words that contain the text you specify, searching from one page to another page. Boolean
FindWords(string searchFor, PdfLine searchAfter, out PdfWord[] words) Finds all words that contain the text you specify. The system starts the search after the line you specify. Boolean
FindWords(string searchFor, PdfSegment searchAfter, out PdfWord[] words) Finds all words that contain the text you specify. The system starts the search after the segment you specify. Boolean
FindWords(string searchFor, PdfWord searchAfter, out PdfWord[] words) Finds all words that contain the text you specify. The system starts the search after the word you specify. Boolean
FindRelativeWord(string searchFor, int occur, int relativeWordOffset, out PdfWord word) Finds a specific occurrence of a word. The system returns a word relative to the word the system finds. Boolean
GetAnnotation(out PdfAnnotation annotation) Retrieve the first annotation the system finds. Boolean
GetAnnotation(int startPage, out PdfAnnotation annotation) Retrieve the first annotation the system finds, beginning with the page number you specify. Boolean
GetAnnotation(int startPage, int endPage, out PdfAnnotation annotation) Retrieve the first annotation the system finds within a range of pages. You must specify the page numbers on which you want the search to start and end. Boolean
GetAnnotation(AnnotationType type, out PdfAnnotation annotation) Gets the first annotation the system finds of the annotation type you specified. Boolean
GetAnnotation(AnnotationType type, int startPage, out PdfAnnotation annotation) Gets the first annotation the system finds of the annotation type you specified. The system starts the search at the page number you specify. Boolean
GetAnnotation(AnnotationType type, int startPage, int endPage, out PdfAnnotation annot) Gets the first annotation the system finds of the annotation type you specified within a range of pages. You must specify the page numbers on which you want the search to start and end. Boolean
GetAnnotations(out PdfAnnotation[] annotations) Gets all of the annotations in the PDF file. Boolean
GetAnnotations(int startPage, out PdfAnnotation[] annotations) Gets all of the annotations, starting at the page number you specify. Boolean
GetAnnotations(int startPage, int endPage, out PdfAnnotation[] annotations) Gets all of the annotations within a range of pages. You must specify the page numbers on which you want the search to start and end. Boolean
GetAnnotations(AnnotationType type, out PdfAnnotation[] annotations) Gets all of the annotations of the type you specify. Boolean
GetAnnotations(AnnotationType type, int startPage, out PdfAnnotation[] annotations) Gets all of the annotations of the type you specify, starting at the page number you specify. Boolean
GetAnnotations(AnnotationType type, int start, int end, out PdfAnnotation[] annots) Gets all of the annotations of the type you specify within a range of pages. You must specify the page numbers on which you want the search to start and end. Boolean
GetImage(out Image image) Extracts the first image the system finds. Boolean
GetImage(int startPage, out Image image) Extracts the first image the system finds, starting the search at the page number you specify. Boolean
GetImage(int startPage, int endPage, out Image image) Extracts the first image the system finds within a range of pages. You must specify the page numbers on which you want the search to start and end. Boolean
GetImages(out Image[] images) Extracts all images. Boolean
GetImages(int startPage, out Image[] images) Extracts all images, starting at the page number you specify. Boolean
GetImages(int startPage, int endPage, out Image[] images) Extracts all images. You must specify the page numbers on which you want the search to start and end. Boolean
GetTable(out DataTable table, TableFill tableFill) Gets the first table found in the document. Boolean
GetTable(out DataTable table, TableFill tableFill, Int32 startPage) Gets the first table found, starting with the StartPage. Boolean
GetTable(out DataTable table, TableFill tableFill, Int32 startPage, Int32 endPage) Gets the first table found between the StartPage and EndPage. Boolean
GetTable(out DataTable table, TableFill tableFill, string startText, Boolean canSpanPages, Int32 distanceFromBottom, string[] endText)

Gets the first table found after the startText is located in the document and before one of the endText items is found. Sets canSpanPages to True If the table spans pages.

Tables that span pages must be located consecutively in the document and be the last table on a page and the first table on the succeeding page.

The distanceFromBottom is the number of points (1/72 of an inch) where the table stops before continuing to the next page. Any data below this point is ignored. Entering zero (0) tells the system to ignore this setting and continue processing to the end of the page.

Boolean
GetTable(out DataTable table, TableFill tableFill, Int32 startPage, string startText, Boolean canSpanPages, Int32 distanceFromBottom, string[] endText)

Gets the first table found starting from the startPage and searching after the location where the startText is located in the document and before one of the endText items is found.

Set canSpanPages to True If the table spans pages. Tables that span pages must be located consecutively in the document and be the last table on a page and the first table on the succeeding page.

The distanceFromBottom is the number of points (1/72 of an inch) where the table stops before continuing to the next page. Any data below this point is ignored. Entering zero (0) tells the system to ignore this setting and continue processing to the end of the page.

Boolean
GetTable(out DataTable table, TableFill tableFill, Int32 startPage, Int32 endPage, string startText, Boolean canSpanPages, Int32 distanceFromBottom, string[] endText)

Gets the first table found searching between the startPage and the endPage after the location where the startText is located in the document and before one of the endText items is found.

Set canSpanPages equal to true If the table spans pages. Tables that span pages must be located consecutively in the document and be the last table on a page and the first table on the succeeding page.

The distanceFromBottom is the number of points (1/72 of an inch) where the table stops before continuing to the next page. Any data below this point is ignored. Entering zero (0) tells the system to ignore this setting and continue processing to the end of the page.

Boolean
GetTables(out DataTable[] tables, TableFill tableFill) Gets all tables found in the document. Boolean
GetTables(out DataTable[] tables, TableFill tableFill, Int32 startPage) Gets all the tables found starting on the startPage to the end of the document. Boolean
GetTables(out DataTable[] tables, TableFill tableFill, Int32 startPage, Int32 endPage) Gets all the tables found between the startPage and the endPage in the document. Boolean
GetValues(out DataTable resultTable, out string documentType) Gets a table with all Text and Optical Mark values in the current document type and returns the detected document type name. Boolean
InsertPages[insertBeforePage, fileToInsert]

Inserts pages in the currently loaded PDF file before the page that you specify.

insertBeforePage - Enter the page number. For example, if you enter 5, the system inserts the page that you specify using the fileToInsert parameter before page 5.

fileToInsert - Enter the name of the PDF file that you want to insert.

Boolean
InsertPages[inputFileName, insertBeforePage, fileToInsert]

Inserts pages in the PDF file that you specify using the inputFileName parameter before the page that you specify.

insertBeforePage - Enter the page number. For example, if you enter 5, the system inserts the page that you specify using the fileToInsert parameter before page 5.

fileToInsert - Enter the name of the PDF file that you want to insert.

Boolean
PdfPage GetPage(int pageNumber) Gets the PdfPage object that corresponds to the page number you specify. Boolean
Reconcile(out DataTable reconciledTable)

Displays a user interface with the current PDF file and the values from the detected document type side by side. This interface allows a user to confirm that the data extracted from the PDF is correct and make corrections to it if necessary.

This method outputs a data table with the original data and the corrected data.

Boolean
Reconcile(Double zoomFactorIn, out DataTable reconciledTable, out Double zoomFactorOut)

Displays a user interface with the current PDF file and the values from the detected document type side by side. This interface allows a user to confirm that the data extracted from the PDF is correct and make corrections to it if necessary.

This method accepts a zoomFactorIn value which sets the zoom on the embedded PDF Viewer. This method outputs a data table with the original data and the corrected data plus a zoomFactorOut which is the zoom factor when the dialog was dismissed. Save this value and use it as the zoomFactorIn on subsequent method calls.

If the user cancels the dialog, the result is False.

Boolean
Save() Saves a PDF file. Boolean
SplitByText[inputFileName, outputFileName, textToFind]

Splits the PDF file that you specify in the inputFileName parameter into one or more PDF files. Each time the system finds the text that you specify in the textToFind parameter, it creates a new PDF file.

outputFileName -Enter the file name that you want to assign to the newly-created PDF files. Each PDF file is then appended with -[filenumber}.

textToFind - Enter the text that you want the system to find as it identifies where to split the PDF file. Case does not matter.

Boolean

Events

EventDescription
FileOpenedOccurs when a file is opened.
OutputSavedOccurs when the PDF file is saved.

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Did you find this content helpful?

Want to help us improve this content?

We'd prefer it if you saw us at our best.

Pega.com is not optimized for Internet Explorer. For the optimal experience, please use:

Close Deprecation Notice
Contact us