Home :: Developers

COWS Web Processing Service (COWS WPS)

«  1. Introduction   ::   Contents   ::   3. Main features  »

2. Installation

2.1. Overview

The installation process for the COWS WPS will vary in complexity depending on whether you wish to run in asynchronous and/or parallel mode. The simplest installation would require just on server (physical or virtual) running synchronous jobs inside the Apache webserver.

This chapter explains the system requirements and installation of the various packages required.

Note that all packages used are available free at the point of use and many are open source projects.

2.2. System requirements

The COWS WPS is dependent on the installation of the tools listed in this section. A further set of Python libraries/eggs are needed but these can generally be installed using the easy_install or pip package management tools.

2.2.1. Python 2.6+

Python 2.6 is required for compatibility with Pylons version 1.0+ (see below).

2.2.2. Pylons

The main COWS WPS web application is written in Pylons [Pylons]. There was a significant version change from Pylons 0.9.7 to 1.0. As a result, code built on version 1.0 is not backwards compatible. The COWS WPS requires Pylons 1.0 or higher and is not compatible with previous versions.

2.2.3. Sqlalchemy

Sqlalchemy [Sqlalchemy] manages the database interactions with COWS WPS. It provides a single API to a variety of underlying libraries and database technologies. In test mode, it can also talk to a file-based or memory-based database using the Sqlite package.

2.2.4. A database

The first COWS WPS implementations have used PostgreSQL [Postgres] for the underlying database. However, the use of Sqlalchemy means that other database systems can be employed for each new installation without changes to the code base.

2.2.5. Oracle Grid Engine

For installations with asynchronous processing enabled the COWS WPS uses the Sun Grid Engine (now known as Oracle Grid Engine [SGE]) scheduling tool to manage queues and job submission.

2.3. Supported platforms

At present, the COWS WPS has been implemented and tested on the following platforms:

  • OpenSuSe 10.3

It is expected that installation should be relatively straightforward on other Linux platforms since most of the code is built upon common and portable third-party tools

The COWS WPS will not run on Windows platforms and has not been tested on Apple Macs.

2.4. Downloading the sources

2.5. Building a virtual environment for housing the WPS

One of the main features of the COWS WPS is its management of asynchronous jobs and connection to offline processing servers. Since the requirement for this feature is usually prompted by a need to manage computational resources the desire to scale-up the offline capability is common.

The scalability of both the WPS server and the Offline server is greatly enhanced by the use of virtual machines. A Virtual Machine (VM) is a completely isolated operating system installation within the normal operating system running on a physical server. The image of a VM can be duplicated onto another VM without the need to re-install software of re-configure the operating system. This approach allows the COWS WPS architecture to scale up to whatever capacity is required. Typically, the COWS WPS will run two “WPS VMs” load-balanced so that both servers can accept and process requests. Multiple “Offline Processing VMs” are deployed with Sun Grid Engine managing the job submission from the WPS VMs. More details are discussed in the next sections.

2.6. Installing the Sun Grid Engine scheduler

2.6.1. Overview of SGE

Sun Grid Engine is an open source batch-queuing system, developed and supported by Sun Microsystems. Grid Engine is open source and free to use from the project website under the Sun Industry Standards Source License.

SGE is used in the COWS WPS to manage the configuration of a set of submission queues that accept jobs from “submission hosts” and queue them on “execute hosts” based on a set of configured rules.

2.6.2. Installation

SGE must be installed on all servers that need to act as “master”, “admin”, “submit” and “execute” hosts. Please see the following page for a guide on how to install and configure SGE:

The standard installation should setup the installation as follows:

Server (or VM) type Host type
WPS 1 master, admin, submit
WPS 2..n submit
Offline 1..n execute

2.7. Deployment

2.8. Deployment configurations

There are many possible configurations in which the COWS WPS can be deployed. The simplest is a single-server deployment but the system is designed to be scalable to any size. Figure *** shows the basic architecture which is made up of:

  • WPS server(s)
  • Offline processing servers(s)
  • State (database) server(s)

2.9. Virtual or physical servers?

The COWS WPS will work whether installed on physical servers or on virtual machines (VMs). The advantage of the VM approach is that the WPS VMs can be replicated via “copy-and-paste” rather than having to manually install software on each system.

The VM approach is also compatible with cloud-technologies that are likely to become common-place in the future.

2.10. Single-server deployment

The simplest method of deploying the COWS WPS is on a single server. This limits the service to only providing “synchronous” responses so it does not exploit the full power of WPS. However, this approach is perfect for proof-of-concept and demonstrator activities. The single-server deployment requires you to build a single instance of the WPS (see the installation section). The key point to note is that all process configuration files must have the process_type set to “sync” to ensure that the WPS runs them directly without attempting to contact the scheduler.

2.11. The 3 server types

2.11.1. WPS servers

The WPS server runs the main Python WPS web application and User Interface within the Apache webserver. This runs as a multi-threaded (in Apache) process that is attached to a pool of additional processes that can be employed to run “synchronous” processes that will run typically for up to 15 seconds. This allows the WPS to respond immediately to small requests whilst scheduling larger requests to the Offline processing servers.

2.11.2. Offline processing servers

The Offline processing server provides a separate and controlled environment was developed for handling offline jobs. Controlling the number of concurrent jobs using queue management is handled by tools dedicated to that purpose. The processes running in this layer are managed by SGE, configured to manage a short and long queue to handle requests of different sizes from the WPS.

Each job is executed using a Python wrapper module that has the ability to report its status to a simple text file. Lengthy processes routinely update this file with information on the percentage of the task that has been completed. When the COWS WPS is polled for information about a current job, the WPS uses the “status location URL” to identify the job and reads the contents of the status file. The current status is then serialised into the Execute Response XML document (specified in the WPS standard) and returned.

2.11.3. State (database) servers

The State server can be the same as the WPS or Offline processing server but the most robust deployment will separate these functions on to a separate, and regularly backed-up, server.

The State server typically serves the following functions:

  • Database host
  • Cache disk host
  • Load-balancing (if used)

The risk of managing state in this manner is the introduction of single points of failure. This can be mitigated by housing the load-balancer, database and cache disk on a pair of VMs in active/passive failover mode. The State VM runs the live services whilst a Backup state VM routinely copies its state from the live version. Through the combined use of database dumps and Rsync mirroring of the cache disk it is possible to continually synchronise the Backup state VM within 1 hour of the live system.

«  1. Introduction   ::   Contents   ::   3. Main features  »