Text Extractor

Extracts text from documents.

Supported file types:

  • .pdf - Attempts to extract text in reading order. Note that results may not be perfect, especially for documents with complex layouts.
  • .docx - Extracts text by iterating through all text elements in the document.
  • .xlsx - Extracts text by iterating through rows and retrieving cell values. Values are separated by tabs. Supports multiple sheets.
  • .xlsm - Same behavior as .xlsx.
Component modes: Single | Batch

Component mode: Batch

Batch mode is designed to extract text from multiple files. A list of file information is retrieved in the Init call, which is then used to retrieve the file data in the Get file call.

SQL

SQL Call: Init (mandatory)

Retrieves the information of the files to extract text from. A Get file call will be made for each file.

May modify database: No

Parameters

@Action string
Will be set to "Init".

Resultset: BatchId (optional)

The batch ID. Will be passed as @BatchId to all following calls.

Table count: repeated zero or one time
Row count: exactly one row
Columns
BatchId mandatory string

The batch ID.

Resultset: File information

The files to extract text from.

Table count: repeated exactly once
Row count: one or more rows
Columns
FileId mandatory string

The ID for the file. Will be passed as @FileId to the Store text and Store error calls.

FileContentType mandatory string

MIME type of the file.

Possible value Description
application/pdf .pdf
application/vnd.ms-excel.sheet.macroEnabled.12 .xlsm
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet .xlsx
application/vnd.openxmlformats-officedocument.wordprocessingml.document .docx

SQL Call: Get file (mandatory)

Call to retrieve the file data to be extracted.

May modify database: No

Parameters

@Action string
Will be set to "GetFile".
@BatchId string
Only if BatchId was provided in the init call.
@FileId string
The ID of the file.

Resultset: File data

Table count: repeated exactly once
Row count: exactly one row
Columns
FileData mandatory binary

The binary data of the file.

SQL Call: Store text (mandatory)

May modify database: Yes

Parameters

@Action string
Will be set to "StoreText".
@BatchId string
Only if BatchId was provided in the init call.
@FileId string
The ID of the file.
@Text string
The text extracted from the file.

SQL Call: Store error (mandatory)

Called when an error has occurred while processing a file. If an error occurs, the file will be skipped and the next file will start processing.

May modify database: Yes

Parameters

@Action string
Will be set to "StoreError".
@BatchId string
Only if BatchId was provided in the init call.
@ErrorMessage string
The cause of the error.
@FileId string
The ID of the file.

SQL Call: Finished (mandatory)

This call will be made when all the files have been processed.

May modify database: No

Parameters

@Action string
Will be set to "Finished".
@BatchId string
Only if BatchId was provided in the init call.

Resultset: Forwarding definitions (optional)

Table count: repeated zero or one time
Row count: exactly one row
Columns
ADMIN_ErrorMessage optional string

Displays a user friendly error message to the user. This blocks any forwarding for the user.

ADMIN_InfoMessage optional string

Displays a user friendly info message to the user. When the user clicks OK the user is forwarded.

ADMIN_SuccessMessage optional string

Displays a user friendly success message to the user. When the user clicks OK the user is forwarded.

ADMIN_Dialog optional string

The dialog alias of a predefined dialog to show the user. Must be the first column in the result set table. Use multiple result set tables to combine with other forwarding.

Use the menu item "Admin > Dialogs" to register new dialogs or find aliases for existing ones.

<xxx> (for ADMIN_Dialog) optional any

Any column without special meaning in the result set with the first column ADMIN_Dialog will be used to make replacements of placeholders in the message and title text.

ADMIN_DebugInfo optional string

Additional information to show the developer when using ADMIN_Dialog.

<passing_field> optional string

Any column with no other specific meaning will be passed along to the menu item or link you are forwarding to.

ADMIN_CidStepsBack optional int

Number of steps in the page history to jump back after execution (the default being one step back). This value overrides any destination specified by the query string.

ADMIN_ReturnToMenuItem optional string

Jumps back to the menu item with this alias after execution. This value overrides any destination specified by the query string. If no prior menu item is found with the given alias, then an error is thrown.

ADMIN_Forward optional string

Displays a user friendly message and then forwards to the next menu item.

ADMIN_ForwardLink optional string

Alias of the link to forward to.

ADMIN_ForwardMenuGroup optional string

Alias of the menu group to show after execution (instead of former menu item). This value overrides any destination specified by the query string.

ADMIN_ForwardMenuItem optional string

Alias of the menu item to execute after execution (instead of former menu item). This value overrides any destination specified by the query string.

ADMIN_Message optional string

Displays a user friendly error message to the user.

ADMIN_PasteHtmlFromPopup optional string

Pastes HTML into an HTML editor. See ADMIN_SetFieldValueFromPopup.

ADMIN_SetFieldValueFromPopup optional string

Sets the value of the field specified in the menuitempopup call. Only select this column if menu item is opened in a popup.

ADMIN_ClosePopup optional bit

If this column is anything but NULL the popup will be closed. Only select this column if the menu item is opened in a popup.

Default: The default behavior is to step back inside the popup window and close it if there is nothing to step back to.
ADMIN_ClosePopupAndReloadOpener optional bit

If this column is anything but NULL the popup will be closed and the parent will be reloaded. Only select this column if the menu item is opened in a popup. Avoid using this feature if the opener is a newEdit as that may interrupt the user's ongoing input.

ADMIN_ClearHistory optional any

When the value is not NULL all navigation history is cleared and the user can't navigate back. This is only supported when forwarding to another menu item.

ADMIN_RefreshMenu optional bit

Will trigger a reload of the sidebar if the column is anything but NULL.

Cache optional string

Cache key to be cleared. Supports wildcards.

CacheUserId optional string

Either a user id or '%'.

Clears all caches (e.g. access permissions) related to the specified user id.

Use '%' to clear caches for all users.

OkButtonText optional string

Changes the text of the OK button when used with ADMIN_ErrorMessage, ADMIN_ConfirmWarning, ADMIN_ConfirmQuestion, ADMIN_ConfirmDelete, ADMIN_InfoMessage, ADMIN_SuccessMessage, ADMIN_Message, ADMIN_Force, or ADMIN_Forward. ADMIN_Force,