Home :: Developers

COWS Web Processing Service (COWS WPS)

«  3. Main features   ::   Contents   ::   5. Service deployment  »

4. WPS processes

4.1. Thinking in terms of “processes”

Unlike other widely adopted OGC Web Services (such as WMS and WCS) the WPS is not a web service. It provides a standardised interface and a framework within which any number of web services can be deployed. This potentially means that it can act as a generic processing container for any useful functionality that you might want expose over the network.

It is useful to equate a single web service with a “process” within the WPS. Each “process” has a set of defined inputs and outputs and is configured to interact with the core WPS. Typically, the configuration options will tell the WPS whether to execute a job synchronously or via a scheduler, whether to zip output files, and whether to e-mail the user on completion of the job.

4.2. How does the WPS know about its “processes”?

4.2.1. The process-WPS interface

A simple interface controls the interactions between the main WPS application and the processes that are deployed within it. This consists of a:

  • a process-specific configuration file
  • a process-specific Python module - containing a process-specific class

4.2.1.1. The process configuration file

The process configuration file has the following sections:

wps_interface
Provides information on Python module and class that the WPS will need to call when executing the process, the process type and whether a dry-run option is available.
globals
Defines metadata such as the identifier (name) of the process, the title and abstract as well as a (currently not implemented) place to define an output definition XML schema.
DataInputs
A set of data inputs defined with various attributes such as data type, item/array, default and optional values.
ProcessOutputs
A set of outputs returned by the process. These will be encoded in the Execute Response document (not yet properly implemented).

The following example configures the SimplePlot process that allows the user to modify the title and bounding box:

[wps_interface]
process_callable = process_modules.simple_plot#SimplePlot
process_type = sync
dry_run_enabled = False
internal = False
store = True
status = False
visibility = public
caching_enabled = False
cache_exclude_params = Username

[globals]
Identifier = SimplePlot
Title = Simple Plot
Abstract = Creates a plot - to show we can do it.
Metadata = none
ProcessVersion = none
OutputDefinitions = text/xml http://kona.badc.rl.ac.uk/ddp/schemas/no_schema_yet.xsd
RequestType = image sync

[DataInputs]
PlotTitle = string
PlotTitle.default = The Name of The Plot
PlotTitle.title = Plot Title
BoundingBox = bbox
BoundingBox.extent = -30|-30|30|30
BoundingBox.title = Bounding Box

[ProcessOutputs]
output = xml_complex_value
output.mime_type =  text/xml
output.schema = schema_url
output.template = complex_output.xml

4.2.1.2. The process python interface module (and class)

The process configuration file points to the process_callable Python module and class in which the actual process code is embedded. This class conforms to a standard API and makes use of range of COWS WPS utilities to perform its dedicated function. The basic signature of the process interface class is as follows:

class MyProcess:

    def __call__(self, context)

    def dryRun(self, context)

The “context” is a ProcessContext object that packages up information about the job such as the inputs, outputs (such as data files), metadata, and interactions with the wider WPS application. The ProcessContext object also crucially provides the hooks for logging the job status during offline processing.

A ProcessBase class is provided to further simplify the interactions with the WPS. Creating a new process module and inheriting from the ProcessBase class (used by create_process.sh - see below) provides the following class signature:

class MyProcess(process_modules.internal.process_base.ProcessBase):

    # Define arguments that we need to set from inputs
    args_to_set = ["MyArg1", "MyArg2"]

    # Define defaults for arguments that might not be set
    input_arg_defaults = {"MyArg1": ["Cleese", "Cheese shop"],
                     "MyArg2": None,
                     }

    # Define a dictionary for arguments that need to be processed
    # before they are set (with values as the function doing the processing).
    arg_processers = {"MyArg2": some_utils.preProcessMyArg}


def _executeProc(self, context, dry_run):
    """
    This is called to step through the various parts of the process
    executing the actual process if ``dry_run`` is False and just
    returning information on the volume and duration of the outputs
    if ``dry_run`` is True.
    """

    # Where the main programme runs
    ...your actual code here...
    ...can call any piece of Python...
    ...or call out via Web or system calls to retrieve outputs...


def _validateInputs(self):
    """
    Runs specific checking of arguments and their compatibility.
    """
    if self.args["MyArg1"] == "Arthur Putey" and self.args["MyArg2"] == "Mouse":
        raise Exception("Invalid arguments provided. If 'MyArg1' is 'Arthur Putey' then 'MyArg2' cannot be 'Mouse'.")

In the example above the class inherits the ProcessBase and then some preliminary work is done with input arguments. The _executeProc method takes the ProcessContext object which is provided by the main WPS app and a boolean dry_run argument. Within _executeProc the code will typically perform real functions if dry_run is False or will estimate the duration and size of the outputs if dry_run is True.

The _validateInputs method is used for very specific validation of the inputs. The main WPS application provides some argument checking but this layer is intended where the rules are too complex to automate.

4.2.2. Process categories

The COWS WPS manages a range of processes as well as providing simple hooks for the administrator to add new processes. The following classification is used:

Process Category Usage Location in the code base
Internal Used internally by the WPS process_modules/internal
Supported Provided with the WPS distribution process_modules/supported
Local Developed within a local WPS deployment process_modules/local
Templates Templates for automated process creation process_modules/templates

The following sections explain which supported processes are distributed with the COWS WPS and how to create your own local processes.

4.2.3. Rules for defining inputs in process configuration files

WPS processes can be invoked through a direct connection to the WPS server or via the COWS WPS User Interface (The COWS WPS User Interface (CWUI)). The Data Inputs defined in the process configuration file is used to validate the input parameters sent to the WPS. They are also used to generate and validate the process input form when using the CWUI. Some of the features of the input parameter definitions are only relevant to these input forms.

4.2.3.1. Data Input types

The DataInputs section of the process configuration file is used to specify the inputs that are available for a given process. The simplest definition of an input informs the WPS of the data type and whether a single value or list of values is expected. For example:

Input1 = string
Input2 = string.list

In the above example there are two inputs defined, Input1 must be of type “string” and Input2 must be a list of “string” types.

The allowed input types are:

string
The input is a string of characters.
int
The input is an integer.
float
The input is a decimal number.
bool
The input is either true or false
filepath
The input is a file path, this is a string but if viewed through the CWUI an auto-complete feature can be used in the input form.
bbox
The input is a list of 4 decimal numbers specified as: west|south|east|north such as 0|-90|360|90.
datetime
The input is a date-time object specified as: YYYY-MM-DDThh:mm:ss such as 2011-01-01T00:00:00.

4.2.3.2. Rejection of requests with invalid inputs

The WPS server will validate each request by checking that each of the inputs can be interpreted as the correct type. If an input is identified as an invalid type an OGC Invalid Parameter Value exception is returned in an appropriate XML response.

If the CWUI input form is being used then this will usually detect any invalid parameters before they are submitted to the server.

4.2.3.3. Specifying additional attributes of input parameters

A further set of attributes can be provided associated with the input parameters in the DataInputs section of the process configuration file. This section describes the attributes available for all parameter types and then those available only to specific parameter types. These attributes are listed in lines directly beneath the line specifying the input name and data type, for example:

Input1 = string
Input1.title = The First Input
Input1.possible_values = baa,moo,woof
Input1.possible_values_labels = The sheep sound,The noise of a cow,What a dog says

Variable = string.list
Variable.title = Climate Variable
Variable.optional = True

All input parameters can have the following additional attributes associated with them:

title
A short title for the input, that will typically be displayed as the parameter name in a client software. In the COWS WPS the title is normally the same as the parameter name with spaces inserted between words. (Spaces are not allowed in parameter names).
abstract
More detailed information about the input, this might be displayed to users as a full description of the input parameter.
default
Default value for the parameter (this will be automatically populated in the CWUI input form).
length
This attribute can only be used for input parameters that have been specified as a list and not a single value.
optional
Boolean (True or False) value specifying whether this input is optional.
possible_values
A comma-separated list of possible values for the input parameter.
possible_values_labels
A comma-separated list of labels to be displayed to the user in place of the possible_values list of input values. These are typically more human-readable versions of the possible values.

The following attributes only relate to the data types provided below:

bbox.extent
An argument of type bbox can be given the extent attribute to specify the limits of the geographical bounding box. For example, SpatialExtent.extent = -20|40|20|60.
filepath.basedir
A space-delimited list of allowed base directories for the filepath parameter provided. If a file path input parameter is provided with a value that does not start with any of these base directories then the request will be rejected.

4.2.3.4. Defining dynamic and dependent input parameters for the CWUI input forms

The COWS WPS User Interface (CWUI) includes a feature that allows the input parameters, and their possible values, to be defined dynamically. This feature is not part of the standard WPS specification but we believe it is an important component of a flexible client interface. For example, you might have a process which allows extraction of variables from datasets, which defines two input parameters called Dataset (the name of a dataset) and Variable (the name of the variable). The list of possible values for Dataset might be known but the possible values for Variable might depend on what the user has selected for the Dataset input parameter.

The CWUI implementation of “dynamic” variables allows the input form to provide the functionality that enables the user to make a selection, update the form (without submitting the request) to load values for dependent parameters, and then select appropriate values from them.

Dynamic variables are specified in the configuration files as by setting the dynamic attribute to True and defining a possible_values_template attribute which provides a URL template for extracting the possible values. Using the example given above the following details would define the Dataset input and the dynamic Variable input whose values depend on the user selection for Dataset:

Dataset = string
Dataset.possible_values = Fish Study 1,Fish Study 2,River Project,Ocean Model

Variable = string
Variable.dynamic = True
Variable.possible_values_template = http://foobaa.baafoo.org/some/service/variables?dataset=${Dataset}

The value for the possible_values_template is an incomplete URL that is used by the CWUI input form page as a template from which it generates a valid URL. That URL is called when the user clicks the “Update Form” button on the input form and the response is parsed and the options displayed to the user.

The possible_values_template uses the format ${INPUT_PARAMETER} to insert a selected value for an input parameter into the URL being requested. In the example above, if the user had selected Fish Study 1 for the value of the Dataset input then the complete URL requested to find out the possible values for teh Variable input would be http://foobaa.baafoo.org/some/service/variables?dataset=Fish%20Study%201.

Additionally, the possible_values_template attribute can include the following values that will be replaced when the WPS reads the process configuration file:

__LOCALHOST__
This will be replaced with the URL to the local host which is defined in the main WPS configuration file (using the baseURI parameter). If that parameter does not exist then it will use a concatenated string of “http://” and the server name (detected locally).
__URL_QUOTED_LOCALHOST__
This will be replaced with the local host (as extracted for __LOCALHOST__) which is then quoted using the python urllib.quote() function.
4.2.3.4.1. Using rules for displaying default values of dynamic input parameters

When the CWUI input form retrieves a list of possible values for an input parameter you can use the default attribute in the process configuration to specify which value should be displayed to the user. Use one of the following WPS rules to specify which item in the possible values list should be displayed:

datetime.default = WPS-RULE:OLDEST
Display the oldest date-time value
datetime.default = WPS-RULE:NEWEST
Display the newest date-time value
<any type>.default = WPS-RULE:FIRST
Display the first value in the list.
<any type>.default = WPS-RULE:LAST
Display the last value in the list.

4.2.3.5. Using the Process Helpers Controller to support dynamic input parameters

The COWS WPS includes a Pylons Controller specifically designed to support the interactive calls that the CWUI makes when building a request involving dynamic inputs. The controller class, ProcHelpersController (in the module cows_wps/controllers/proc_helpers.py), is called accessible through URLs routed as follows:

$BASE_URL/proc_helpers/inputs/<process_id>/<method_name>?<query_string>

For example:

http://ceda-wps2.badc.rl.ac.uk/proc_helpers/inputs/CDMSSubsetVariable/variables?Dataset=/badc/ukmo-hadisst/metadata/hadisst.xml&parameter_name=Variable

This will interpret the URL as follows:

  • The <process_id> is CDMSSubsetVariable so locate the process directory and find the python module called <process_id>Helpers.py (in this case CDMSSubsetVariableHelpers.py. Within that module find the class with the same name as the module.
  • Create an instance of that class and then call the method (<method_name> in the URL) called variables.
  • Send the contents of the <query_string> to the method as a dictionary of parameters.

When the method is called it attempts to return a JSON response structured as:

response = {"response":
               {"Dataset":
                  {"defined": True,
                   "labels": [list of labels],
                   "values": [list of values],
                   "information": "Please select a dataset from the list."}
               },
            "requested_url": "/proc_helpers/inputs/Subsetter/method?&parameter_name=Dataset"
           }

If the response is defined then the CWUI form will interpret the [list of labels] and [list of values] to populate the form field for the input parameter specified.

The <process_id>Helpers.py class can be used for other supporting code to add specific features to a process. For example, the following call renders an HTML table of available datasets for the Subsetter process:

$BASE_URL/proc_helpers/help_page/Subsetter/help_page?describe_inputs=Dataset`

4.3. Supported processes

4.3.1. Processes supported by the current COWS WPS

Supported processes are those that have been developed to perform functions that are likely to be useful to groups deploying the WPS in many different environments. Because of this they are distributed with the code. This section provides a description of the processes currently supported.

4.3.1.1. Climate Data Operators (CDO) Processes

Climate Data Operators (CDO) is a very useful command-line tool for manipulating climate data files. Some processes have been developed within the COWS WPS that wrap some of the CDO functionality. See the Climate Data Operators (CDO) WPS Processes section for more details.

4.3.1.2. Subsetter Process

The “Subsetter” is a tool that allows the extraction of variable subsets from a range of datasets. The user can select a dataset, a single variable, time range and bounding box. The output format can also be selected (NetCDF or CSV) along with instructions on how to divide output files into sensible time chunks. The tool uses CDAT’s CDMS (Climate Data Management System) libraries to interact with the datasets in the archives. The extraction jobs run on the batch processing servers and the user is e-mailed when the job has completed. Datasets are typically described in Climate Data Markup Language (CDML). See the The Subsetter data extraction Process section for more details.

4.3.1.3. Plotting tools . plot data from a NetCDF file

4.3.1.4. TestDap

4.4. Local Processes

4.4.1. Adding a new process

The WPS makes it straightforward to add new local processes without have to understand the detailed workings of the code. The bin/create_process.sh script

4.4.1.1. Running the create_process.sh script

To create a new local process simply decide on the name of your process module file and the process class name and use these as the two arguments:

$ bin/create_process.sh my_brand_new_process MyBrandNewProcess

The convention is to use all lower-case for the process module and CamelCase for the class name (although this is not enforced in the code).

The above will create the following files:

process_modules/local/my_brand_new_process.py
process_configs/local/MyBrandNewProcess.ini
process_tests/local/test_my_brand_new_process.py

These files will be identified next time the service is re-started.

4.5. Testing your process

The COWS WPS provides a test mode in which the server can be run inside a single python process to which requests can be made as if the system was operationally deployed. The test.ini configuration file is used to configure the test service. Typically you will need to modify a few settings inside the test.ini file in order to run the test server.

4.5.1. Process test modules

When you run the create_process.sh script a test module was created along with the main python process module and the configuration file. For a process module called process_modules/local/my_brand_new_process.py the relevant test module will be under the path process_tests/local/test_my_brand_new_process.py.

The test process will not do anything meaningful until you begin to populate it with some tests. For each test you will define a set of arguments within a python dictionary and run them using the ProcessTester object. Since the process test for your new process has been copied from a template it already contains the main structure that calls the python code. All you need to do is to populate an inputs dictionary and run the test. A simple test module (without all the comments) could be:

from cows_wps.tests.process_tester import *
inputs_dict = { "MyArg": "hello" }
outputs_dict = { }
options = {"verbose": True}
tester = ProcessTester("MyBrandNewProcess")
tester.addTest(inputs_dict, outputs_dict, options)
tester.runAllTests()

This code imports the process_tester.py module which contains the class ProcessTester. This class constructs a request URL based on the inputs_dict you provide and then emulates a COWS WPS server and send it the request. The outputs_dict represents outputs that you expect to be returned in the XML response from the request.

The ProcessTester instance in the test module can have any number of tests added to it, enabling you to set up a collection of different input dictionaries that you would like to test. If any of the tests fail then the test module raises and exception and exits. You can then diagnose the output and debug the process module to fix the problem.

To run the test module type:

$ python process_tests/local/test_my_brand_new_process.py

If the script ends without raising an exception then all your tests ran successfully.

4.5.2. Running tests individually

The example code above shows how you can add a group of tests to the ProcessTester before running them all with a single command. An alternative approach is to run single tests as follows:

tester.runTest("MyBrandNewProcess", inputs_dict, outputs_dict, options)

4.5.3. Testing for failures using bad arguments

Proper testing requires that you test for both success and failure. When testing for failures you would typically define an input dictionary of arguments that you intend to be incompatible with your process. Since the test will raise an exception can test for the exception using:

try:
    tester.runTest("MyBrandNewProcess", inputs_dict, outputs_dict, options)
except Exception, err:
    pass # failure expected - good!

This approach could be improved by defining specific Exceptions or error strings that are being tested for.

4.5.4. What does the options dictionary do?

The options dictionary send to the ProcessTester when a test is run can contain two possible settings:

The process configuration file has the following sections:

verbose
This is either set to True or False and it instructs the ProcessTester to show or hide the logging output when running the test process. The default value is False.
similarity_threshold
This is a value between 0 and 1 that specifies how similar you actual XML response is when compared to the expected XML response built from the inputs and outputs dictionaries provided as inputs. The default value is 1 which implies that the actual XML must match the expected XML response exactly.

4.6. Process Inputs and Outputs

4.6.1. Process Inputs

As explained above, each process has a set of defined inputs that are managed through the process configuration file and python process module. The configuration file provides the first stage of validation but additional code may be included in the python process module to enable more sophisticated testing and validation of input parameters.

4.6.2. Process Outputs

More to come.

4.6.2.1. XML outputs

More to come...

4.6.2.2. File outputs

More to come...

4.6.2.3. OpenDap end-point outputs

More to come...

4.7. The wps_interface Section of the Process Configuration file

As explained above, the wps_interface section of the process configuration file provides a number of settings that instruct the WPS how to interact with the process. This section explains these in more detail:

process_callable
A string that represents the python module path (using dots to separate packages) to the class that runs the process. The module path is separated from the class name by a hash (“#”) symbol. E.g. process_modules.simple_plot#SimplePlot.
process_type
The type of process. This can be set to async or sync.
dry_run_enabled
A boolean defining whether the process can be called using the Costonly=true argument to do a dry-run instead of full execution.
internal
A boolean defining whether the process is internal to the WPS, i.e. it has no utility to external users.
store
A boolean defining whether outputs from the process should be stored.
status
A boolean defining how the status of the process should be represented.
visibility
A string defining who the process should be visible to. This has the default value of hidden which means that it will not be seen via the CWUI. If it is set to public then it will be visible to all. A third option is to set it to a space-delimited set of usernames are allowed to view the process. In such cases it will attempt to use the UserManager class to find out the username and its value will be searched for in the visibility string in the configuration file.
caching_enabled
A boolean defining whether the cache should be interrogated for the outputs instead of running a new process.
cache_exclude_params
A space-delimited string which contains any parameters that should be excluded from the caching algorithm that checks to see if a request is cached.

4.8. Application profiles in WPS 1.0.0

The version 1.0.0 specification introduced the idea of “Application profiles” for WPS. We expect certain application profiles to emerge over the next few years that are important to the COWS WPS and its interactions with meteorological, ocean and climate data processing. Future implementations are expected to handle application profiles when these have been developed by the community.

«  3. Main features   ::   Contents   ::   5. Service deployment  »