How to parse Comma-Separated-Value (CSV) files
Summary
Text files containing a series of values separated by commas are known as CSV files or Comma-Separated Values files.
You can create CSV files from a Microsoft Excel spreadsheet..
If none of the values contains a comma, you can parse CSV lines with a Parse Delimited rule, identifying the comma character as the delimiter. (For an example of this approach, see the PDN Exchange article Bulk Operator Load.)
However, if the CSV file contains a value that itself contains a comma, the value when saved from Excel will be surrounded by double quote characters. Double quote characters are are optional and often omitted for the values that do not contain a comma character. For such CSV files, you can't use Parse Delimited rules to parse the file, because neither quote characters nor comma characters are true delimiters that surround each field.
(Other files use a backslash \ or another character between every pair of values, including values containing commas. You can parse such files using a Parse Delimited rule.)
This article shows how to use two standard function rules parseCSVHeader() and parseCSVDetail() from the MapTo library to convert lines of a CSV file into property values.
Update: Parse Delimited rules in V5.5 can parse CSV files directly. See How to parse a Comma-Separated-Values (CSV) file using a file service.
Suggested Approach
In this example, each line of the CSV file (after the first line) contains 4 values that are to become the pyLabel, pyDescription, pyPrioritySample and pyUrgencyWorkAdjust values of a work object, separated by commas.
The first line (column headers in Excel) of the CSV file contains the property names.
Because the last pyDescription value contains a comma, this value is between double quote characters. For all other values, the quotes are optional and omitted. Because spaces, commas, and quotes are significant, take care to follow the CSV "format" or use Excel to create the CSV file.
The example uses a file listener to monitor a specified directory and process any CSV file it finds, creating a work object from each line of the file (except for the first line).
Step 1: Identify the work type and initial flow rule
A new work object ordinarily starts a flow execution. By default, services do not have a "current operator", and such services, when executing, lack some of the properties associated with a interactive requestor. Make sure the initial tasks in the flow do not depend on such properties as "the current operator" or the operator's work group or organization.
In this example, the TaskSample flow routes all new work objects to a fixed workbasket ([email protected]
) identified explicitly in the flow.
Step 2. Create a service package
Every service requires a service package. In this example, all values in tabs other than the Context tab contain default values.
Step 2. Create the service activity
The service activity performs the processing for the service. In this case, the Loop
service activity calls the built-in standard activity Work-.svcAddWorkObject, passing as a parameter the name of a flow (TaskSample) to start. (When the Loop
activity starts, the four properties extracted from the line of the CSV file are already on its primary page.) The standard activity acquires a work object ID, saves and commits the work object, and starts the flow execution.
Step 3. Create the File Service rule
The file service rule processes each CSV file found by the listener. (For a more general description of file services, see How to set up a file service and file listener.) The Service tab identifies a page that is constructed each time the file service rule is started.
The Method tab determines how the contents of the file are parsed. For Windows text files, the line terminator is \r\n. Since this text file is a CSV file, each line is a "record" for parsing purposes.
The Request tab calls two standard functions to parse the CSV file. At runtime, the parseCsvHeader() function — executed only once, for the first line of the file— extracts property names and saves them in an array named pyTagList
.
The parseCSVDetail() function, executed for each row after the first row, creates a temporary primary page for the service activity containing values for each property, using property names from the the pyTagList
array, and calls the service activity.
The flow execution routes the first assignment to a fixed workbasket.
Step 4: Create the File Listener.
This file listener monitors the directory E:\temp for any Windows file with a CSV file type. When a file is found, it calls the file service rule.
This file listener creates a RPT file and an internal record (instance of Log-Service-File) for each file processed. These are optional, but helpful in debugging and reporting.
Step 5. Test
1. Use the System Management Application (Listener Management panel) to start the file listener. Alternatively, you can stop and restart the Process Commander server node to start the file listener.
2. To monitor service processing in full detail, open the service file rule and select Run > Trace Open Rule. The Tracer starts. Adjust Tracer settings to determine the desired amount of detail. For example:
3. Create a test file and copy it to the directory that the listener monitors. Wait.
4. After the listener finds and processes the CSV file, examine the corresponding RPT file.
5. Locate and examine the work objects created by the file service rule. In this example, the open assignments are in the [email protected]
workbasket:
Work object W-199 was created third but is listed first, because the pyUrgencyWorkAdjust value in the that row of input is +5, leading to an overall urgency value of 15. Assignments in this list view report are sorted by decreasing urgency.
In this example, the created work objects contain only four property values obtained from the input, and all end up in a single workbasket. In more sophisticated cases, the CSV file could provide many properties and the service activity could create more varied work objects with varying initial flows.